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[57] ABSTRACT 

This invention relates to customized electronic identification 
of desirable objects, such as news ankles, in an electronic 
media environment, and in particular to a system that 
automatically constructs both a "target profile** for each 
target object in die electronic media based, for example, on 
the frequency with which each word appears in an article 
relative to its overall frequency of use in all articles, as well 
as a "target profile interest summary" for each user, which 
target profile interest summary describes the user's interest 
level in various types of target objects. The system then 
evaluates the target profiles against the users' target profile 
interest summaries to generate a user-customized rank 
ordered listing of target objects most likely to be of interest 
to each user so that the user can select from among these 
potentially relevant target objects, which were automatically 
selected by this system from the plethora of target objects 
that are profiled on the electronic media. Users 1 target profile 
interest summaries can be used to efficiently organize the 
<hstribution of information in a large scale system consisting 
of many users interconnected by means of a communication 
network. Additionally, a cryptograplricaUy-based pseud- 
onym proxy server is provided to ensure the privacy of a 
user's target profile interest summary, by giving the user 
control over the ability of third parties to access this sum- 
mary and to identify or contact the user. 
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PSEUDONYMOUS SERVER FOR SYSTEM among a number of artidcs or target objects identified as of 

FOR CUSTOMIZED ELECTRONIC possible interest to a user. 
IDENTIFICATION OF DESIRABLE OBJECTS Therefore, in the field of information retrieval, there is a 

long-standing need for a system which enables users to 

CROSS-REFERENCE TO RELATED s navigate through the plethora of information, With commer- 

AFPLICAITONS cialization of communication networks, such as the Internet 

the growth of available information has increased. Customi- 
This patent application is a continuatic*-in-part of U.S. zaSion ^ me information delivery process to the user's 
patent applic ation S a. No. 08/346,425, filed Nov. 2 9, 1994 unique tastes and interests is the ultimate solution to this 
andtiUed-SYSTm4AM)MEraODrmSCHm)ULING r^Mcm. However, the techniques which have been pro- 
BROADCAST OF AND ACCESS TO VTDBO PRO- to ^ dmcr only ^ user's interests on a 
GRAMS AND OTHER DATA USING CUSTOMER superficial level or provide greater depth and intelligence at 
PROFILES", which application is assigned to the same ^ ^ rf unwanted demands on the user's time and energy, 
assignee as the present apphcatioiL While many researchers have agreed that traditional meth- 
fTTETT T) OF INVENTION 15 °^ ***** 01X11 lacking in this regard, no one to date has 
rioi^ w in Ymiivr* successfully addressed these problems in a holistic manner 
This invention relates to customized electronic identifi- and provided a system mat can fully learn and reflect the 
cation of desirable objects, such as news articles, in an user's tastes and interests. This is particularly true in a 
dectronic media environment, and in particular to a system practical coinmercial context, such as on-line services avail- 
that automatically constructs both a "target profile* for each ^ able on the Internet There is a need for an information 
target object in the electronic media based, for example, on retrieval system that is largely or entirely passive, 
the frequency with which each word appears in an article unobtrusive, undemanding of the user, and yet both precise 
relative to its overall frequency of use in all articles, as well and coaqxehensive in its ability to learn and truly represent 
as a "target profile interest summary** far each user, which the user's tastes and interests. Present information retrieval 
target profile interest summary describes the user's interest 25 systems require the user to specify the desired information 
level in various types of tar get objects. The system then retrieval behavior through cumbersome interfaces, 
evaluates the target profiles against the users' target profile Users may receive informatioD on a computer network 
interest summaries to generate a user-customized rank either by actively retrieving the irifbrmation or by passively 
ordered listing of target objects most likely to be of interest receiving information that is sent to them. Just as users of 
to each user so that the user can select from among these jq information retrieval systems face the problem of too much 
potentially relevant target objects, which were automatically mfbnnatxon. so do users who are targeted with electronic 
selected by this system from the plethora of target objects junk ™n by individuals and organizations. An ideal system 
that are profiled on the electronic media. Users' target profile would protect the user from unsolicited advertising, both by 
interest summaries can be used to efficiently organize the automatically extracting only the most relevant messages 
distribution of information in a large scale system consisting 35 received by electronic mail, and by preserving the confi- 
of many users interconnected by means of a communication dendality of the user's preferences, which should not be 
network. Additionally, a cryptographically based proxy freely available to others on the network, 
server is provided to ensure the privacy of a user's target Researchers in the field of published article information 
profile interest summary, by giving the user control over the retrieval have devoted considerable effort to finding efficient 
ability of third parties to access this surnmary and to identify and accurate methods of allowing users to select articles of 
or contact the user, interest from a large set of articles. The most widely used 

methods of information retrieval arc based on keyword 
matching: the user specifies a set of keywords which the user 

ft is a problem in the field of dectronic media to enable thinks are exclusively found in the desired articles and the 

a user to access information of relevance and interest to the 45 information retrieval computer retrieves all articles which 

user without requiring the user to expend an excessive contain those keywords. Such methods are fast but are 

amount of time and energy searching for the informatioa rwtoriousjy unreliable, as users may not think of the right 

Electronic media, such as on-tine information sources, pro- keyword^, ot the keywords may be used in unwanted articles 

vide a vast amount of infcfTnanoa to users, typically in the in an irrelevant or unexpected context As a result the 

form of "articles," each of which comprises a publication 30 information retrieval computers retrieve many articles 

item or document mat relates to a specific topic The which are unwanted by the user. The logical combination of 

difficulty with electronic media is mat the amount of infer- keywords and me use of wild-card search parameters help 

mation available to the user is overwhelming and the article improve the accuracy of keyword searching but do not 

repository systems that are connected on-line are not orga- completely solve the problem of inaccurate search results, 

nized in a manner that sufficiently simplifies access to only 55 Starting in the 1960's, an alternate approach to information 

the articles of interest to the user. Presently, a user either fails retrieval was developed: users were presented with an article 

to access relevant articles because they are not easily iden- and asked if it contained the information they wanted, or to 

tilled or expends a significant amount of time and energy to quantify how close the information contained in the article 

conduct an exhaustive search of all articles to identify those was to what they wanted. Each article was described by a 

most Hkefy to be of interest to the user. Furthermore, even 60 profile which comprised either a list of the words in the 

if the user conducts an exhaustive search, present informa- article or, in more advanced systems, a table of word 

tion searching techniques do not necessarily accurately frequencies in the article. Since a measure of similarity 

extract only the most relevant articles, but also present between articles is the distance between their profiles, the 

articles of marginal relevance due to the functional Hmiuv measured similarity of article profiles can be used in article 

tions of the information searching techniques. There is also 65 retrieval. For example, a user searching for information on 

no existing system which automatically estimates the inner- a subject can write a short description of the desired Lufor- 

ent quality of an article or other target object to di stinguish matioa The information retrieval computer generates an 
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article profile far the request and then retrieves articles with inverse document frequency) and label piles fay using the 

profiles similar to the profile generated for the request These determined key words. 

requests can then be refined using "relevance feedback*. Numerous patents address information retrieval methods, 

where die user actively or P*^vcjy rates the articles ^ QOQC M records of a user's interest based on 

rct ^ f ^L? 0 ^ £f^° D 5 passive rnomtonng of which articles the user accesses. None 

is to what is desired. The information retrieval computer *TT ^ ""TT... . A ^ „ f ort „™ r «. 

then uses this relevance feedback informatioo to refinVme of * c systcms de ^ nbc l m ^ n x te pre ^computer 

umw i»« . mu«uu««« w architectures to allow fast retrieval of articles distributed 

reouest orofile and the Drocess is repeated until the user 1 „ / . . . . 

A number of researchers have looked at methods for 10 mi mj^g mahoi, fQr purposes of commerce or of 
selecting articles of most inters* to users. An article titled maldnnaiis^wi&camisoai^a^adey^o^iecmli 
•Social Information filta^algoifthins to automating „f users' interests. US. Ptt. No. 5321.833 issuedto Chang 
W of mouth" was pubhshedat tte OL-95 Proceedings ^ ^ tearilcs a metfaod ta which ^ terms touse 
by Pam Macs et aland describes the Ringo information k M rclricval qucry , ^ spedfy the 
retrieval system which recrain^ndsmusjcd selections. The 1S wdgntings of me <fi fi Breot terms. The Chang system then 
Ringo system requires active feedback from me users— cidcolalesnmh^leveis of weight^ criteria. US. Pat No. 
users must nuuiually specify how much they like or dislike 330u09 is3Ucd to rt ^ teaches a method for 
each musicd selection^ The Ringo system maintains a alXicics ^ a ,0^^^ languages by con- 
complete list of users ratings of munc selections andmakes sUM&n ^ (S VD or PCA vectors) which 
xecommeBdabois by finding which selections weir liked by M ^p,^ correlations between the different words. VS. PaL 
multiple people. However the Ringo system does not take ^ 533^54 h5ued to Graham et aL disdoscs a method for 
advantage of any available descriptions of the music, such as t^^^gfoots^ a manual by comparing a query with 
structured desomuons in a data base, or fixse to* such as nodcs in „ tree. VS . Pat No. 5331356 addresses 
that contained in music reviews. An article failed -Evolving techniques for deriving morphological part-of-specch infor- 
^ e ^ f0 ^T^f^ C< l infC ^ Ii ? lUtering\ i»bfistad at 23 matioo and thus to make use of the similarities of different 
the Ptoc. QthlEEEConf on AI for Applications by Sheth and formsofmcMmc ward (e . g . "artkic" ^ "articles"). 

Maes, described the use of agents for inforrnatton tillering _ . . . . , . 

which use genetic algorimms to learn to categorize Usenet , Therefore, there presently is ao formation retrieval and 

news articles. In this system, users must define news cat- ****** °P erabte m M d< ^ romc . mcd f * avtron ' 

cgories and the users actively indicate their opmion of the M meat that enables a user to access information of relevance 

selected articles. Their system uses a list of keywords to andintere^t * te™j^**pddngiten*U>a^ 

represent sets of articles and the records of users' interests ** cxccssrvc amouat rf 

are updated using genetic algorithms. SOLUTION 

A number of other research groups have looked at the 

automatic generation and labeling of clusters of articles for 35 The above^escribed problems are solved and a technical 

the purpose of browsing through (he articles. A group at advance achieved in the field by the system for customized 

Xerox Pare published a paper titled "Scatter/gather, a electronic identification of desirable objects in an electronic 

cluster-based approach to browsing large article collections" media environment, which system enables a user to access 

at the 15 Ann. Intl SIGIR *92, ACM 3 18-329 (Cutting et aL target objects of relevance and interest to the user without 

1992). This group developed a method they call "scatter/ 40 requiring the user to expend an excessive amount of time 

gather" for performing information retrieval searches. In this and energy. Profiles of the target objects are stored 00 

method, a collection of articles is "scattered" into a small el ectronic media and are accessible via a data comnninica- 

number of clusters, the user then chooses one or more of tion network. In many applications, the target objects are 

these clusters based on short siimmaries of the cluster. The ijiformational In nature, and so may themselves be stored on 

cHiyfret clusters are then "gathered** into a subcoflection, 45 electronic media and be accessible via a data communication 

and then the process is repeated. Each iteration of this network. 

process is expected to produce a small, more focused Relevant definitions of terms for the purpose of this 
collection. The cluster "stmunaries** are generated by pick- description inrludy (a.) an object available for access by the 
ing those words which appear most frequently in the cluster user, which may be either physical or electronic in nature, is 
and the titles of those articles closest to the center of the 50 termed a "target object*, (b.) a digitally represented profile 
cluster. However, no feedback from users is collected or indicating t hat target object's attributes is termed a 'target 
stored, so no performance improvement occurs over time. profiled, (c.) the user looking for the target object is termed 
Apple's Advanced Technology Group has developed an a "user", (d.) a profile holding mat user's attributes, includ- 
interface based on the concept of a "pile of articles". This ing age/zip code/etc. is termed a "user profile", (e.) a 
interface is described in an article titled "A 'pile* metaphor 55 summary of digital profiles of target objects that a user likes 
for supporting casual organization of information in Human and/or dislikes, is termed the "target profile interest sum- 
factors in computer systems" published in CHI '92 Conf. mary" of that user, (f .) a profile consisting of a collection of 
Proc. 627-634 by Mander. R. G. Salomon and Y. Wong. attributes, such that a user likes target objects whose profiles 
1992. Another article dried "Content awareness in a file are similar to this collection of attributes , is termed a " search 
system Interface: rmplemrnring the *pfle* metaphor for orga- 60 profile" or in some a "query" or "query profile," (g.) a 
nizing information" was published in 16 Ann. Int'l SIGIR specific embodiment of the target profile interest summary 
'93, ACM 260-269 by Rose ED. etaL The Apple interface which comprises a set of search profiles is termed the 
uses word frequencies to automatically file articles by pick- "search profile set" of a user, (h.) a collection of target 
ing the pOe most similar to the article being filed. This objects with similar profiles, is termed a "duster," (i.) an 
system functions to cluster articles into subfiles, determine 63 aggregate profile formed by averaging th e attributes of all tar 
key words for indexing by picking the words with the largest get objects In a cluster, termed a "cluster profile," (j.> a real 
TF/IDF (where TP is term (word) frequency and IDF is the number determined by calculating the statistical variance of 
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the profiles of an target objects in a cluster, is termed a nses a fundamental methodology fox accurately and effi- 

"craster variance," (k.) a real number determined by calcu- ciently m«t^"g users and target objects by automatically 

frtjng thf mflTimum distance hgtwB«i the profiles of any two calculating, using and updating profile information that 
target objects in a cluster, is termed a "duster diameter. describes both the users* interests and the target objects 1 

The system for electronic identification of desirable 5. characteristics. The target objects may be published articles, 

objects of the present invention automatically constructs purchasable items, or even other people, and their properties 

both a target profile for each target object in the electronic are stored, andVor represented and /or denoted on the elec- 

media based, for example, od the frequency with which each tronic media as (digital) data. Examples of target objects can 

word appears in an article relative to its overall frequency of include, but axe not limited to: a newspaper story of potential 

use in all articles, as well as a "target profile interest l0 interest, a movie to watch, an item to buy, e-mail to receive, 

summary** for each user, which target profile interest sum- or another person to correspond with. In all these cases, the 

mary describes the user's interest level in various types off information delivery process in the preferred embodiment is 

target objects. The system then evaluates the target profiles based on determinmg me sim^ 

against the users* target profile interest summaries to gen- ^ ^ ^ me profit of target objects for which the 

^ l^^T^^f*^ hstingrftar gpt ***** 15 user<c*asiiiuliffus^ 

^^ tote ^^ to ^^ TO ^^T? n P**- individual data mat describe a target object and 

^tftitute the target object's profile are herein termed 

whfch w« automaticaUy selected by rids system trcrn the ^a^^ ^ target object Attributes may include, but 

piethora of target objects available on the electronic media. TT . rT ^ ^JT"^ "»y ^^T\ / 

0 . J ... . . . , . C1 are not limited to, the following: (1) long pieces of text (a 

Because people have multiple interests, a target profile " . . . _ *, , . 

:^*~T^JZJ^f^ a ^rzljT* 20 i^wspaper story, a movie review, a r^ 

individual search profiles, each of which identifies one of the director, name of town from which an advertisement was 
user's areas of interest Bach user is presented with those P laccd ^ the language in which an article was 
target objects whose profiles most doseiy match the user's written), (3) rmmeric measurements (price of a product 
interests as described by the user's target profile interest 25 "ting S? vcn to a movie, reading level of a book), (4) 
summary. Users' target profile interest summaries are auto- associations with other types of objects (list of actors in a 
rnatically updated on a continuing basis to reflect each user's movie, fist of persons who have read a document). Any of 
chugging jntm*te Tn trrgi-* ohjrrtg r*n Kr> g^n*** these attributes, but especially the Dumeric ones, may cor- 
into dusters based on their similarity to each other, for relate with the quality of the target object such as measures 
catainple. based on similarity M of its popularity (bow often it is accessed) or of user 
the target objects are published articles, and menus auto- satisfaction (number of complaints received), 
matically generated for each duster of target objects to allow The preferred embodiment of the system for customized 
users to navigate throughout the dusters and manually electronic identification of desirable objects operates in an 
locate target objects of interest. For reasons of confident!- electronic media environment for accessing these target 
ality and privacy, a particular user may not wish to make 35 objects, which may be news, dectrooic mall, other pub- 
public all of the interests recorded in the user's target profile lished documents, or product descriptions. The system in its 
interest summary, particularly when these interests are deter- broadest construction comprises three conceptual modules, 
mined by the user's purchasing patterns. The user may which may be separate entities distributed across many 
desire that all or part of the target profile interest summary irrylcrrtfatmg systems, or combined into a lesser subset of 
be kept cxmfidential. such as information relating to the 40 physical entities. The specific emtodtment of this system 
user's political, religious, financial or purchasing behavior; disclosed herein illustrates the use of a first module which 
indeed, ccrfMentialiry with respect to purchasing behavior automatically constructs a Target profile** for each target 
is the user's legal right in many states. It is therefore object in the dectronic media based on various descriptive 
necessary that data in a user's target profile interest summary attributes of the target object A second module uses interest 
be protected from unwanted disclosure except with the 45 feedback from users to construct a 'target profile interest 
user's agreement At the same time, the user's target profile summary*' for each user, for example in the form of a "search 
mtexest summaries inustte servers profile set" consisting of a plurality of search profiles, each 
that perform the nmtrhlng of target objects to the users, if the of which corresponds to a single topic of high interest for the 
benefit of this nmtehing is desired by both providers and user. The system further includes a profile processing mod- 
consumers of the target objects. The disclosed system pro- 50 ule which estimates each user's interest in various target 
vides a solution to the privacy problem by using a proxy objects by reference to the users' target profile interest 
server which acts as an intermediary between the informs- summaries, for example by comparing the target profiles of 
tion provider and the user. The proxy server dissociates the these target objects against the search profiles in users' 
user's true identity from the pseudonym by the use of search profile sets, and generates for each user a customized 
cryptographic techniques. The proxy server also jjermits 55 rank-crdered listing of target objects most likely to be of 
users to control access to their target profile interest sum- interest to that user. Each user's target profile interest 
marks and/or user profiles, including provision of this summary is automatically updated on a continuing basis to 
information to marketers and advertisers if they so desire, reflect the user* 3 changing interests, 
possibly in exchange fox cash or other considerations. Max- Target objects may be of various sorts, and it is sometimes 
keters may purchase these profiles in order to target adver- 60 advantageous to use a single system that delivers and/or 
tisements to particular users, or they may purchase partial clusters target objects of several distinct sorts at once, in a 
user profiles, which do not include enough information to unified framework. For example, users who exhibit a strong 
identify the individual users in question, in order to carry oat Interest in certain novels may also show an interest in certain 
standard kinds of demographic analysis and market research movies, presumably of a <hriil»T nature. A system in which 
on the resulting database of partial user profiles. 65 some target objects arc novels and other target objects are 
Tn the uief e u e d ernbodiment of the invention, the system movies can discover such 1 correlatiori and exploit it in order 
for customized electronic identification of desirable objects to group particular novels with particular movies, eg., for 
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clustering purposes, or to recommend the movies to a user 
who has demonstrated interest in the novels, Similarly, if 
users who exhibit an interest in certain World Wide Web 
sites also exhibit an interest in certain products, the system 
can match the products with the sites and thereby recom- 
mend to the marketers of those products mat they pUce 
advertisements at those sites, eg., in the form of hypertext 
links to their own sites. 

The ability to measure the similarity of profiles describing 
target objects and a user's interests can be applied in two 
basic ways: filtering and browsing. Filtering is useful when 
large numbers of target objects are described in the elec- 
tronic media s pace. These target objects can for example be 
articles that are received or potentially received by a user, 
who only has time to read a small fraction of them. For 
example, one might potentially receive all items on the AP 
news wire service, all items posted to a number of news 
groups, aU a<tvertiscme&ts in a set of newspapers, or all 
unsolicited electronic mail* but few people have the time or 
indination to read so many articles. A filtering system in the 
system for customized electronic identification of desirable 
objects automatically selects a set of articles that the user is 
likely to wish to read. The accuracy of this filtering system 
improves over time by Doting which articles the user reads 
and by generating a measurement of the depth to which the 
user reads each article. This Mormation is then us cd to 
update the user's target profile interest summary. Browsing 
provides an alternate method of selecting a small subset of 
a large number of target objects, such as articles. Articles are 
organized so that users can actively navigate among groups 
of articles by moving from one group to a larger, more 
general group, to a smaller, more specific poup, or to a 
dosely related group. Each individual artkle forms a one- 
member group of its own. so that the user can navigate to 
and from individual artkle s as well as larger gro ups. ~" 

ifisnSieSiS?^ 
gr£uj^dlihl5^ 

mSged^ifu^Urgex^ai^ hierarchies of . 

dusters then form the basis for mmiring and navigational 
systems to allow the rapid searching of large numbers of 
articles. This same Hustrring technique is applicable to any 
type of target objects that can be profiled on the electronic 



times that minimm the traffic flow in the communication 
network to thereby effitienUy provide the desired informa- 
tion to the user and/or conserve valuable storage space by 
only storing those target objects (or segments thereof) which- 
5 are relevant to the user's interests. 

BRIEF DESCRIPTION OF THE DRAWING 

FID. 1 illustrates in block diagram form a typical archi- 
tecture of an dectronic media system in which the system 
to for customized dectronic identification of desirable objects 
of the present invention can be implemented as part of a user 
server system; 

FIG. 2 illustrates in block diagram form one embodiment 
of the system fear customized electronic identification of 
desirable objects; 
FIGS. 3 and 4 illustrate typical network trees; 
FIG. 5 illustrates, in flow diagram form a method for 
automatically generating article profiles and an associated 
hierarchical menu system; 
FIGS. 6-9 illustrate examples of menu generating pro- 



is 



20 



30 



There are a number of variations on the theme of devd- 
oping and using profiles for article retrieval, with the basic 
implementation of an on-line news clipping service repre- 
senting the preferred embodiment of the invention. Varia- 



45 



oons of this basic system are disdosed and comprise a 

system to filter electronic mail, an extension for retrieval of so human beings, movies, or mutual funds, ft is assumed that 



FIG. 1# illustrates in flow diagram form the operational 
steps taken by the system for customized dectronic identi- 
fication of desirable objects to screen articles for a user; 

FIG. 11 illustrates a hierarchical duster tree example; 

FIG. 12 illustrates in flow diagram form the process for 
detcrminatioD of likelihood of interest by a specific user in 
a sdected target object; 

FIGS. 13A-B illustrate in flow diagram form the auto- 
matic rlurtm'ng process; 

FIG. 14 illustrates in flow diagram form the use of the 
pseudonymous server; 

FIG. 15 illustrates in flow diagram form the use of the 
system for accessing information in response to a user 
query; and 

FIG. 16 illustrates in flow diagram form the use of the 
system for accessing information in response to a user query 
when the system is a distributed network implementation. 

DETAILED DESCRIPTION 
MEASURING SIMILARITY 

This section describes a general procedure for automati- 
cally measuring the similarity between two target objects, or, 
more precisely, between target profiles that are a utomatically 
generated for each of the two target objects. This similarity 
determination process is applicable to target objects in a 
wide variety of contexts. Target objects being compared can 
be. as an example but not limited to: textual documents. 



target objects such as purchasable items which may have 
more complex descriptions, a system to automatically build 
and alter menuing systems for browsing and searching 
through large numbers of target objects, and a system to 
construct virtual communities of people with common inter- 
ests. These intelligent fitters and browsers are necessary to 



the target profiles which describe the target objects are 
stored at one or more locations in a data communication 
network on data storage media associated with a computer 
system. The computed similarity measurements serve as 
55 input to additional processes, which function to enable 
human users to locate desired target objects using a large 



provide a truly passive, intelligent system interface. ^user'^ computer system. These additional processes estimate a 
irxbsfaceTOa^t^^ / human user* s interest in various target objects, or else duster 

scn^ for >u^>flrst<ime*an»intellige j a plurality of target objects in to logically coherent groups, 

the afliiiities>rjetweeniusm*an^^ detailed, *w The methods used by these additional processes might in 



comprehensive target profiles and user-specific target profile 
intnest summaries maWff the system to provide responsive 
routing of specific queries for user information access. The 
information maps so produced and the application of users* 



principle be implemented on either a single computer or on 
a computer network. Jointly or separately, they form the 
underpinning for various sorts of database systems and 
information retrieval systems. 



target profile interest summaries to predict the information 65 Target Objects and Attributes 



consumption patterns of a user allows for pre -caching of 
data at locations on the data communication network and at 



In classical Information Retrieval (IR) technology, the 
user is a literate human and the target objects in question are 
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textual documents stored on data storage devices intercon- where the system for aitfffrrired electronic identification of 

nected to the user via a computer network- That is, the target desirable objects is activated to identify movies of interest 

objects consist entirely of text and so are digitally stored on the system is likely be concerned with the values of 

the data storage devices within the computer network. attributes such as these; 

However, there are other target object domains that present 5 (a.) title of movie, 

related retrieval problems that are not capable of being rb.) miw of director, 

solved by present iiiformarion retrieval technology: (<x> Modon Association of America (MTAA) 

(a.) the user is a film buff and the target c4gects are movies chiM-aprxopriateness rating (0=G. 1=PG, . . . )♦ 

available on videotape. (d.) date of release, 

(b.) the user is a consumer and the target objects are used 10 ( e ,) number of stars granted by a particular critic. 

cars being sold. (f.) number of stars granted by a second critic 

(c) the user is a consumer and the target objects are fe ) numb er of stars granted by a third critic. 

products being sold through promotional deals. ^ ^ ^ review by the third critic, 

(<L) ™ tZl^^^^L**. ^ ^TrJ? 15 (i). list of customers who have previously rented this 

publicly traded stocks, mutual funds and/or real estate v ^ 

* *. movie, 

* (\\ list of actors, 

(e.) the user is a stuc^m and to target object gach movie has a (hfxerent set of values for these 

being offered. attributes. This example conveniently illustrates three kinds 

(f.) the user is an activist and the target objects are 20 attributes. Attributes ^ 

Congressional bills of potential concern. that might be found in a database record. It is evident that 

(g.) the user is a direct-mail marketer and the target they can be used to help the user identify target objects 

objects are potential customers. (movies) of interest For example, the user might previously 

(h.) the user is a net-surfer and the target objects are pages, have rented many Parental Guidance (PG) films, and many 

servers, or newsgroups available on the Worldwide films made in the 1970V This generalization is useful: new 

Web. firms with values for one or both attributes that arc numeri- 

(i) the user is a ptnlanthropist and the target objects are calry rimflar to these (such as MPAA rating of 1. release date 

charities. of 1975) are judged similar to the films die user already 

(j.) the user is ill and the target objects are medical 30 ^ ^^5* {ntat t^!T tj? 

specialists. h are textual attributes. They too are lrnportant for helping 

speaansis. the user locate desired films. For example, perhaps the user 

(k.) the user is an employee and the target objects arc ^ a ^ int^ m flbnTwhcW review text 

potential employers. (attribute h) contains words like "chase,* "explosion" 

(1.) the user is an employer and the target objects are "explosions," 'hercs" "gripping," and "superb." This gen- 
potential ernployees. 33 ^^^^ ^ useful m identifying new films of inter- 
im.) the user is an beleaguered executive and the target est Attribute i is an associative attribute. It records as soda- 
objects are electronic mail messages addressed to the tions between the target objects in this domain* namely 
user. movies, and ancillary target objects of an entirely different 

(n.) the user is a lonely heart and the target objects are sort namely humans. A good indication that the user wants 

potential conversation partners. 40 to rent a particular movie is that the user has previously 

(o.) the user is in search of an expert and the target objects rented other movies with similar attribute values, and this 

are users, with known retrieval habits, of an document holds for attribute I just as it does for attributes a-h. For 

retrieval system example, if the user has often Hked movies that aistomer 

(p.) the user is a social worker and the target objects are 45 C 17 and customer have rented, then the user may lite 

families that may need extra visits. other such movies, which have similar values for attribute l 

/„ \ „ w „« ^»^i n ot<t rtw. to™,., nK ^ c Attribute j is another example of an associative attribute. 

iq 'i^7J^ recording associatJonsltetween target objects and actors, 

women for whc>m mammograms may be ^"that any of these attribute s c^ be made subject to 

(r.) the user is an auto insurance company and me target Mmcnticatk)n ^ ^ pro Q le h coiistructed, through the 

^je^arepcteimalc^tccners 30 useof signatum;foccxa ^ 

In all these cases, the user wishes to locate some small ^acccn^^ 

subset of the target objects-such as the target objects that ^^oUn^cit^ ^eTobjcc^nd specifies its authentic 

the user most desires to rent, buy, investigate, meet read, vamc te attribute c. 

give niaimnograms to, insure, and so forth. The task is to These traxe kinds of attributes are common: numeric, 

help the user ideira^me 55 textual and associative. In the classical inf carnation retrieval 

where the user's interest in a target ob^ctis defined to be a ^ ^ ^ « documents (or more 

mimerical ine^uiement of the user s relative desire to locate cohcrenl o^eot sections extracted by a text 

that object rather than segmentatioo method), the system might only consider a 

.J^ei£^ « attribute when rne^ 

arm^tosclv^mem^ 40 text of the target object However, a more sophisticated 

above. It is assumed ^^f system wouW^nsXa longer target profile, including 

the system for oistminzed electee^ identified of desrr- n ? imrnV ^ assodg&ve attributes: 

able objects, and that specifically, the system stores (or has ~VTT _ - AnmnMA it0m%v . 

the ability to reconstruct) several pieces of information (a.) full text of document (textual), 

about each target object These pieces of information art 65 (b.) title (textual), 

termed "attributes": collectively, they are said to form a (c) author (textual), 

profile of the target object or a "target profile.* For example, (d.) language, in which document is written (textual). 
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(e.) date of creation (numeric), target object (for example, ref creed journal article vs. UPI 

(f.) date of last update (numeric), newswire article vs. Usenet newsgroup posting vs. question- 

, . . . „ . answer pair from a question-and-answer list vs. tabloid 

fr) length in wards (numeric), newspapeTaiticte vs. . .); the source may be represented 

(n.) reading level (numeric). 5 as a single-term textual attribute. Important associative 

(i.) quality of document as rated by a mira\jarty edito- attributes forabypertext document are the list of documents 

rial agency (numeric), that it links to, and the list of documents that link to it 

(j.) list of other readers who have retrieved this document Documents with similar citations are similar with respect to 

(associative). the former attribute, and documents that are cited in the 

As another domain example, consider a domain where the 10 same places are similar with respect to the latter. A conven- 

user is an advertiser and the target objects are potential tion may optionally be adopted that any document also link* 

customers. The system might store the following attributes to itself. Especially in systems where users can choose 

for each target object (potential customer): whether or not to retrieve a target object, a target object's 

(a.) first two digits of zip code (textual), popularity (or circulation) can be usefully measured as a 

(b.) first three digits of zip code (textual), 15 numeric attribute specifying the number of users who have 

(c.) entire five-digit zip code (textual), T^^H ^^^f^ ^^.T^ 0 

, ' , ~ - . . , ^ . . that also indicate a kind of popularity include the number of 

(d.) distance of lesrfence from advertiser s nearest physi- if&tonnin where Urget objects 

cal storefront ( numeric). 0 ^ , ^ « 

avu * SM v 7 ' # are messages posted to an electronic community such as an 

(e.) annual family income (numeric), M compu tcr bulletin board ox newsgroup, and the number of 

(f .) number of children (numeric), links |o*^jng to a target object, in the *kwn*in where target 

(g.) list of previous items purchased by this potential objects are interlinked hypertext documents on the World 

customer (associative). Wide Web or a similar system. A target object may also 

list of filenames stored on this potential customer's client receive explicit numeric evaluations (another land of 

computer (associative), 25 numeric attribute) from various groups, such as the Motion 

list of movies rented by this potential customer (associative). Picture Association of America (MPAA). as above, which 

list of investments in this potential customer's mvestment rates movies* ap pro pri a t e n ess for children, or the American 

portfolio (associative). Medical Association, which might rate the accuracy and 

list of documents retrieved by this potential customer novelty of medical research papers, or a random survey 

(associative), 30 sample of users (chosen from all users or a selected set of 

written response to Rorschach inkblot test (textualX experts), who could be asked to rate nearly anything. Certain 

multiple-choice responses by this customer to 20 self-image other types of evaluation, which also yield numeric 

questions (20 textual attributes). attributes, may be carried out mechanically. For example. 

As always, the notion is mat similar consumers buy the difficulty of reading a text can be assessed by standard 

similar products. It should be noted that diverse sorts of 35 procedures that count word and sentence lengths, while the 

information are being used here to characterize consumers, vulgarity of a text could be defined as (say) the number of 

from their consumption patterns to their literary taste s and vulgar words it contains, and the expertise of a text could be 

psychological peculiarities, and that this fact illustrates both crudely assessed by counting the number of similar texts its 

the flexibility and power of the system for customized author had previously retrieved and read using the invention, 

electronic identification of desirable objects of the present 40 perhaps confining this count to texts mat have high approval^ 

invention. Diverse sorts of information can be used as ratings from critics. Finally, it is possible to synthesize^ 

attributes in other domains as well (as when physical certain textual attributes merhanfcaUy, for example torecon- 

ccoDomic, psychological and interest-related questions are struct the script of a m ovie by applying speech recognition 

used to profile the applicants to a dating service, which is techniques to its scxindtrack or by applying optical character 

indeed a possible domain for the present system), and the 45 rec ojpu^o ntechniqnes to its dosed-caption subtitles, 

advertiser domain is simply an example. /vyDecornposing complex Attributes 

As a final domain example, consider a domain where the * Although textual and associative attributes are large and 

user is an stock market investor and the target objects are complex pieces of data, for information retrieval purposes 

publicly traded corporations. A great many attributes might they can be decomposed into smaller, simpler numeric 

be used to characterize each corporation, including but not 50 attributes. This means mat any set of attributes can be 

limited to the following: replaced by a (usually larger) set of numeric attributes, and 

(a.) type of business (textual). hence mat any profile can be represented as a vector of 

(b.) corporate mission statement (textual), numbers denoting the values of these numeric attributes. In 

(c.) number of employees during each of the last 10 years P***lar. atextual attribute, sucfaas tl* full text of axnovk 

(ten separate^eric attSs), 35 ^^^^Z^* 0 * nm f^ ^ 

v s that represent scores to denote the presence and significance 

( } grC ^ m employees during rf ^ WQrds -aback,- -abacus," and so on 

each of me last 10 years, through "zymttrgy" in that text The score of a word in a text 

(e.) dividend payment issued in each of the last 40 may be defined in numerous ways. The simplest definition is 

quarters, as a percentage of current shire price, «, that the score is the rate of the word in the text, which is 

(f.) percentage appreciation of stock value during each of computed by cccrputing the number of times the word 

the last 40 quarters, list of shareholders (associative). occurs in the text, an d dividing this number by the total 

(g.) composite text of recent articles about the ccsporation number of words in the text This sort of score is often called 

in the financial press (textual). the "tejmJrjp^uencyf (TP) of the word. The definition of 

It is worth noting some additional attributes that are of 63 term frequency may optionally be modified to weight dif- 

interest in some domains. Id the case of documents and ferent portions of the text unequally: for example, any 

certain other dcanains, it is useful to know the source of each occurrence of a word in the text* s title might be counted as 
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a 3-fold or more generally k-fold o ccu ii en cc (as if the title Just as a textual attribute may be decomposed into a 

had been repeated k times within the text), in order to reflect number of component terms (letter or word n-grams). an 

a heuristic assumption mat the words in the tide are par- associative attribute may be decomposed into a number of 

ticuUrty important indicators of fhe text's content or topic. component associations. For instance, in a Hwnnin where the 

However, for lengthy textual attributes* such as the text of 5 target objects are movies, a typical associative attribute used 

an entire document me score of a word is typically defined in profiling a movie would be a list of customers who have 

to be not merely its term frequency, but its term frequency rented mat movie. This list can be replaced by a collection 

multiplied by the negated logarithm of the word's "global rf numeric attributes, which give me "association scores" 

frequency - asmeasured with respect to thetcxt^^tr^rte ^ novle each of the customers known to me 

m quesUotL Tte ^o^freo^ of a word, wfuch effee- w s For ^ ^ 165th mdb numcric at*^ 

uveiy measures the word s iminfcrmauveness, is a fraction ' , . . ^.^ . 

between 0 and 1. defined to be the fraction of all target ^Z***? J^l 

objects for which me textual attribute in question contaL c^omer#l« where the association scoreis defined to be 

trnTword This adjusted score is often known in the art as 1 if customer #165 has previously rented the movie, and 0 

TF/TDF ("term frequency times inverse document otherwise. In a subtler refinement this association s core 

frequency"). When global frequency of a word is taken into 15 could be defined to be the degree of interest, possibly zero, 

account in this way, the common, uiiiiiformative words have that customer #165 exhibited in the movie, as determined by 

scores comparatively close to zero, no matter how often or relevance feedback (as described below). As another 

rarely they appear in the text Thus, their rate has tfttf* example, in a domain where target objects are ccsnpanies. an 

influence on the object's target profile. Alternative methods associative attribute truncating the major shareholders of the 

of ralnil arlng word scores Include latent semantic indexing 20 company would be decomposed into a collection of asso- 

or probabilistic models. ciatkm scores, each of which would indicate the percentage 

Instead of breakup the text into te one of the company (possibly zero) owned by some particular 

could alternatively break the text into overlapping word individual or corporate body. Just as with the term scores 

bigrams (sequences of 2 adjacent words), or more generally, used in decomposing lengthy textual attributes, each asso- 

word n-grams. These word n-grams may be scored in the 23 ciation score may optionally be adjusted by a rnuMplicatrve 

same way as individual words. Another possibility is to use factor: for example, the association score between a movie 

character n-grams. For example, this sentence contains a and customer #165 might be multiplied by the negated 

sequence of overlapping character 5-grams which starts tor logarithm of the "global frequency" of customer #165, Le.. 

e", "or ex", **r exa*\ "exam", "examp", etc The sentence the fraction of all movies that have been rented by customer 

may be diaracterizect Imprecisely but usefully, by the score 30 #165. Just as with the term scores used in decomposing 

of each possible character 5-gram ("aaaaa*\ "aaaab*, . . . textual attributes, most association scores found when 

**zzzzz n ) in the sentence. Conceptually speaking, in the decomposing a particular value of an associative attribute 

character 5-gram case, the textual attribute would be decom- are zero, and a similar economy of storage may be gained in 

posed into at least 26 3 =1 1.88 1376 numeric attributes. Of exactly the same manner by storing a list of only those 

course, for a given target object, most of these numeric 35 ancillary objects with which the target object has a nonzero 

attributes have values of 0, since most 5-grams do not appear association score, together with their respective association 

in the target object attributes. These zero values need not be scores, 

stored a nyw her e. For purposes of digital storage, the value Similarity Measures 

of a textual attribute could be characterized by storing the set What does it mean for two target objects to be similar? 
of character 5-grams that actually do appear in the text 40 More precisely, how should one measure the degree of 
together with die nonzero score of each one. Any 5-gram similarity? Many approaches are possible and any reason- 
that is not included in the set can be assumed to have a score able metric that can be computed over the set of target object 
of zero. The decomposition of textual attributes is not profiles can be used, where target objects are considered to 
limited to attributes whose values are expected to be long be similar if the distance between their profiles is small 
texts. A simple, one-term textual attribute can be replaced by 45 according to this metric Thus, the following preferred 
a collection of numeric attributes in exactly the same way. embodiment of a target object similarity measurement sys- 
Consider again the case where the target objects are movies. tern has many variations. 

The "name of dnector" at attributes giving the scores for First, define the distance between two values of a given 

replaced by numeric attributes giving the scores for attribute according to whether the attribute is a numeric. 

*7edericc~Fellim, M 'Woody-Allen," 'Terence- Davies," and 50 associative, or textual attribute. If the attribute is numeric 

so forth, in that attribute. For these one-term textual then the distance between two values of the attribute is the 

attributes, the score of a word is usually defined to be its rate absolute value of the difference between the two values, 

in the text, without any consideration of global frequency. (Other definitions are also possible: for example, the dis- 

Note that under these conditions, one of the scores is 1, tance between prices pi and p2 might be defined by J(pl- 

while the other scores are 0 and need not be stored. For 55 p2)(/(max(pl,p2)+l), to recognize that when it comes to 

example, if Davies did direct the film, then it is *Terence- customer interest, 550 00 and $5020 are very similar. 

Davies" whose score is 1, since t Teretioe-Davies' v consti- whereas $3 and $23 are not) If the attribute is associative, 

tntes 1 00% of the words in the textual value of the "name of men its value V may be decomposed as described above into 

director** attribute, ft might seem that nothing has been a collection of real numbers, representing the association 

gained over simply regarding the textual attribute as having 60 scores between the target object in question and various 

me string value •Terence-Davies." However, the trick of ancillary objects. V may therefore be regarded as a vector 

decomposing every non-numeric attribute into a collection with components V lv Vj, V 3 , etc, representing the associa- 

of numeric attributes proves useful for the dustmng and tion scores between the object and ancillary objects 1, 2, 3. 

decision tree methods described later, which require the etc. respectively. The distance between two vector values V 

attribute values of different objects to be averaged and/or &s and U of an associative attribute is then computed using the 

crdinalry ranked. Only numeric attributes can be averaged or angle distance measure, arccos (VU7sqrt((Vv')(UTJ f )). (Note 

ranked in this way. that the three inner prodncts in this expression have the form 
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XY*=X 1 Y 1 +X 2 Y a +X 3 Y 3 + . ... and mat for efficient 
computation tenns of the form X/Y, may be omitted from 
this sum if either of the scores X, and Y, is zero.) Finally, if 
the attribute is textual then its value V may be decomposed 
as described above into a coHectioc of real numbers, rep- 5 
resenting the scores of various word n-grams or character 
n-grams in the text Then the value V may again be regarded 
as a vector; and the distance between two values is again 
Hffityrf via the angle distance measure. Other similarity 
metrics bet ween two vectors, such as the dice measure, may 10 
be used instead. B happens that the obvious alternative 
metric. Euclidean distance, does not work well : even similar 
texts tend not to overlap substantially in the content words 
they use, so that texts encountered in practice are all 
substantially orthogonal to each other, assuming thatTF/IDF is 
scores are used to reduce me influence of non-content words. 
The scores of two words in a textual attribute vector may be 
correlated; for tT«mpi^ "Kennedy and "JFK* tend to 
appear in the same documents. Thus it may be advisable to 
alter the text somewhat before coinputing the scores of terms 20 
in the text by using a synonym dictionary that groups 
together similar words. TTjc effect of this optional prc- 
alteration is that two texts using related words are measured 
to be as similar as if they had actually used the same words. 
One technique is to augment the set of words actually found 33 
in the article with a set of synonyms or other words which 
tend to co-occur with the words in the article, so that 
"Kennedy" could be added to every article mat mentions 
"JFK." Alternatively, words found in the article may be 
wholly replaced by synonyms, so that "JFK" might be 30 
replaced by "Kennedy" or by "John F. Kennedy* wherever 
it appears. In either case, the result is that documents about 
Kennedy and documents about JFK are adjudged similar. 
The synonym dictionary may be sensitive to the topic of the 
document as a whole; for example, it may recognize that 35 
"crane" is likely to have a different synonym in a document 
that mentions birds than in a document mat mentions 
construction. A related technique is to replace each word by 
its morphological stem, so that "staple", "stapler", and 
"staples" are all replaced by "staple" Common function 40 
words ("a", iwf, "the" . . . ) can influence the calculated 
similarity of texts without regard to their topics, and so are 
typically removed from the text before the scores of terms in 
the text are computed. A more general approach to recog- 
nizing synonyms is to use a revised measure of the distance 45 
between textual attribute vectors V and U, namely arccos 
(AVCAUy/sqrt (AV(AVyAU(AU/), where me matrix A is 
the dimensionality-reducing linear transformation (or an 
approximation thereto) determined by collecting the vector 
values of the textual attribute, for all target objects known to 50 
the system, and applying singular value decomposition to 
the resulting collection. The same approach can be applied 
to the vector values of associative attributes. The above 
definitions allow us to determine how close together two 
target objects are with respect to a single attribute, whether 53 
iiumeric, associative, or textual. The distance between two 
target objects X and Y with respect to their entire mnlti- 
attribute profiles P x and P r is then denoted d(X,Y) or oXP^ 
P r ) and defined as: 

60 

(((dkttoce with rape* to attribute »X w «a^ of ittribfe a)/+ 
((distance wiih xespect to fitribde bX w*trt of ttnbute b))S- 
((distKDce with respect 10 tcriboJe c}(wti$A of ittribute c)J*+ . 
• f 

where k is a fixed positive real number, typically 2, and the 65 
weights are non-negative real numbers indicating the rela- 
tive importance of the various attributes. For example, if the 



target objects are consumer goods, and the weight of the 
"color** attribute is comparatively very small, then price is 
not a consideration in determining similarity: a user who 
likes a brown massage cushion is predicted to show equal 
interest in the same cushion manufactured in blue, and 
vice-versa. On the other hand, if the weight of the "color" 
attribute is comparatively very high, then users are predicted 
to show interest primarily in products whose colors they 
have liked in the past a brown massage cushion and a blue 
massage cushion are not at all the same kind of target object, 
however similar in other attributes, and a good experience 
with one does not by itself inspire much interest in the other. 
Target objects may be of various sorts, and it is sometimes 
advantageous to use a single system mat is able to compare 
tar get objects of distinct sorts. For example, in a system 
where some target objects are novels while other target 
objects are movies, it is desirable to judge a novel and a 
movie «Tntl«r if their profiles show that «jw«i*r users Hke 
them (an associative attribute). However, it is important to 
note that certain attributes specified in the movie's target 
profile are undefined in the novel's target profile, and vice 
versa: a novel has no "cast HsT associative attribute and a 
movie has no 'reading level** numeric attribute. In general, 
a system in which target objects rail into distinct sorts may 
sometimes have to measure the similarity of two target 
objects for which somewhat different sets of attributes arc 
defined. This requires an extension to the distance metric 
d(*,*) A*fin*A above. In certain applications, it is sufficient 
when carrying out such a comparison simply to disregard 
attributes that are not defined for bom target objects: this 
allows a cluster of novels to be matched with the most 
similar cluster of movies, for example, by considering only 
those attributes that novels and movies have in common. 
However, while mis method allows comparisons between 
(say) novels and movies, it does not define a proper metric 
over the combined space of novel s and movies and therefore 
does not allow clustering to be applied to the set of all target 
objects. When necessary for clustering or other purposes, a 
metric that allows comparison of any two target objects 
(whether of the same or different sorts) can be defined as 
follows. If a is an attribute, men let Max(a) be an upper 
bound on the distance between two values of attribute a; 
notice that if attribute a is an associative or textual attribute, 
this distance is an angle determined by arccos, so mat 
Max(a) may be chosen to be 180 degrees, while if attribute 
a is a numeric attribute, a sufficiently large number must be 
selected by the system designers. The distance between two 
values of attribute a is grven as before in the case where both 
values are rtcfinrcl; the distance between two undefined 
values is taken to be zero; finally, the distance between a 
defined value and an nndrfined value is always taken to be 
Max(aV2. This allows us to determine how dose together 
two target objects are with respect to an attribute a. even if 
attribute a does not have a defined value for both target 
objects. The distance d(* *) between two target objects with 
respect to their entire multi -attribute profiles is then given in 
terms of these individual attribute distances exactly as 
before. B is assumed that one attribute in such a system 
specifies the sort of target object ("movie", "novel", etc), 
and that this attribute may be highly weighted if target 
objects of different sorts are considered to be very different 
despite any attributes they may have in common. 

UTILIZING THE SIMILARITY MEASUREMENT 
Matching Buyers and Sellers 

A simple application of the similarity measurement is a 
system to match buyers with sellers in small-volume 
markets, such as used cars and other used goods, artwork, or 
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employment Sellers submit profiles of the goods (target 
objects) they want to setL and buyers submit profiles of the 
goods (target objects) they want to buy. Participants may 
submit or withdraw these profiles at any time. The system 
for customized electronic identification of desirable objects 
computes the similarities between seller-submitted profiles 
and buyer-submitted profiles, and when two profiles match 
closely (i.e., the similarity is above a threshold), the corre- 
sponding seller and buyer are notified of each other's 
identities. To prevent users from being flooded with 
responses, it may be desirable to limit the number of 
notifications each user receives to a fixed number, such as 
ten per day. 

filtering: Relevance Feedback 

A filtering system is a device that can search through 
ftp rnanvjargct ob jects and estimate a. gjven JisejIajutttgEStJii 
each target object, so ag to identify those. that are «f ^r^t^t 
interest to the user. TOc.filtering syste m use s relevance. fee d 
back.to.refine its knwledqfi . qf the^seT >,in|ereff^; when- J 
ever the filtering system identifies a target object as poten- 
tially interesting to a user, the user (if an on-line user) 
provides feedback a* to whether or' not that target object 
really is of interest Such feedback is stored long-term in 
summarized form, as part of a database _of_use r_fecdback 
information^ »mt may be provided either actively or pas- 
sively. In active feedback, the user explicitly indicates his or 
her interest, for instance, on a, scale of -2 (active distaste) 
through 0 (no special interest) to 10 (great interest). In 
passive feedback, the system infers the- user's interest from 
the nggrUAehavkH: For example, if target objects are textual 
documents, the system might monitor which documents the 
user chooses to read,' or not to read, and how much time the 
user spends reading them. A typical formula for assessing 
interest in a document via passive feedback, in this domain, 
on a scale of 0 to 10. might be: 

+2 if me second page is viewed, 

+2 if all pages are viewed, 

+2 if more man 30 seconds was spent viewing the 
document, 

+2 if more man one minute was spent viewing the 
document, 

+2 if the minutes spent viewing the document are greater 
{\\ than half the number of pages, 
J IF the target objects are electronic mail messages, interest 
points might also be added in the case of a particularly 
lengthy or particularly prompt reply. If the target objects are 
purchasable goods, interest points might be added for target 
objects that the user actually purchases, with further points 
in the case of a large-quantity cr high-price purchase. In any 
domain, further points mi ght be added for target objects that 
the user accesses early in a session, on the grounds mat users 
access the objects that most interest them first Other potcn- 

tial g«Trac nf pagjvff foftrihacfc inHiifU> an rieetmnie mea- 
surement of the extent to which the user' s pupils dilate while 
me us er views the target object or a description of the target 
object It is possible to combine active and passive feedback. 



user's screen, can be used to continuously display the 
passive feedback score estimated by the system for the target 
object being viewed, unless the user has manually adjusted 
the indicator by a mouse operation or other means in order 
5 to reflect a different score for mis target object, after which 
the indicator displays the active feedback score selected by 
the user, and this active feedback score is used by the system 
instead of the passive feedback score. In a variation, the user 
cannot see or adjust the indicator until just after the user has 
10 finished viewing the target object Regardless how a user's 
feedback is computed, it is stored long-term as part of that 
user's target profile interest summary. 
Filtering: rwtyriiftinjnff Thp t caJ Interest Through Siinilarity 

Relevance feedback only determines the usct? interest in " 
ce rtain targcL pbjects: namely, the target objects that the user 
his^aetualiy had the opportunity to evaluate (whether 
actively or passively). For target objects that the user has not 
yet seen, the filtering system must estimate the user's 
interest This estimation task is the heart of the filtering 
20 problem, and the reason that ttw ^iHl*rfty ^<ypmwit is 
important. More concretely, the preferred embodiment of the 
filtering system is a news dipping service that periodically 
prcsents jhe user with pews, articles,of pot ential interes t The 
useTprovkles^ active and/or passive feedback to the system 
23 reja flng*to these presented articles . However, the system 
does not nave feedback information from the user for 
articles that have never been presented to the user, such as 
new articles mat have just been added to the database, or old 
articles that the system chose not to present to the user. 
30 Similarly, in the dating service domain where target objects 
are prospective romantic partners, the system has only 
received feedback on old flames, not on prospective new 
loves. 

C^\\ As shown in flow diagram form in FIG. 12. the evaluation 
35 of the likelihood of interest in a particular target object for 
a sp ecific user can automatically be computed . The interest 
thata given target object X holds for a usertTis assumed to 
be a sum of two quantities: q(U, X). the intrinsic "quality" 
of X, plus f(U, 70. the topical interesT that users like U 
have in target objects like X. For any target object X. the 
intrinsic quality measure q(U. X) is easily estimated at steps 
I2t 1-12*3 directly from numeric attributes of the target 
object X The computation process begins at step 12*1, 
where certain designated numeric attributes of target object 
X are specifically selected, which attributes by their very 
nature should be positively or negatively correlated with 
use rs* interest Such attributes, termed "quality attributes," 
have the normative property that the higher (or in some cases 
lower) their value, the more interesting a user is expected to 
find them. Quality attributes of target object X may includr, 
but are not limited to, target object X*s popularity among 
users in general, the rating a particular reviewer has given 
target object X. the age (time since authorship — also known 
as outdatedness) of target object X. the number of vulgar 
55 words used in target object X. the price of target object X. 
and the amount of money that the company selling target 
object X has donated to the user's favorite charity. At step 
1202. e ach of the selected attributes is multiplied by a 
p ositive or negative weight indicative of the strength of user 
U ' s^ieterencefor those target objects that have high values 
fortius attribiitcTwrnch wcighTmust^ retrieved fronraaata 
fUc"sT5nng quality attribute weights for the selected user. At 
step 1203*. a weighted sum of the identified weighted 
selected attributes is computed to determine the intrinsic 
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60 



One option is to take a weighted average of the two ratings. 
Another option is to use passive feedback by default but to 

f feedbacks scoreT 'ln^ttie scenai^a^yeT^^nstance^n 
uninteresting article may sometimes remain on the display 
device a long period while the user is engaged in unrelated 
business; the passive feedback score is then inappropriately / 

high, and the user may wish to correct it before continuing^ 65 quality measure q(U, X). At step 12*4. the summarized 
In the preferred embodiment of the invention, a visual 



indicator, such as a sliding bs or indicator needle on the 



weighted relevance feedback data is retrieved, wherein some 
relevance feedback points are weighted more heavily than 
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others and the stored relevance data can be summarized to be satisfactory in 4 to 5 years, ft makes these lecomroco- 

some degree, for example by the use of search profile sets. dations both to user V and to users whose investment 

The more difficult part of determining user ITs interest in portfolios and other attributes are similar to user Vs. The 

target object X is to find or compute at step 1205 the value relevance feedback provided by user V in this case may be 

of f(U. XX which denotes the topical interest that users like s either active (feedTxacfeatisfaction ratings provided by the 
U generally have in target objects like X The method of investor V) or passive (feedbac*=differaK* between aver- 

detennining a user's interest relies on the following heuds- a 8 c am"** return of the investment and average annual 

tic: when X and Y are similar target objects (have similar return of the Dow Jones index rxrfolio since purchase of the 

attributes), and U and V are similar users (have similar ^Tf 51 ™^^™^^ A . , . . 

attributesX then topical interest^, X) is ^ «o have "4^toS 

a sirmlar value to the vahKof tortodin tercst f(V ^Tlus (U, X) and (V, Y), for any users U and V and any target 

heuristic leads to an effective method because estimated obiects X and Y. We have already seen how to define the 

values of the topical interest function ft*. •) are actually distance d(X, Y) between two target objects X and Y, given 

know n tec certain arguments to that function: specifically, mC u- attributes. We may regard a pair such as (U, X) as an 

if user V has provided a rdevance-feedhack rating of r(V, Y) 15 extended object that bears all the attributes of target X and 

for target object Y then insofar as that rating represents user jul the attributes user U; then 

Vs true interest tn target object Y, we have r<V, Y>q(V, and (V, Y) may be computed in exactly the same way. This 

Y>f(V, Y) and can estimate f(V, Y) as r(V ? Y)-q(V. Y). approach requires user U, user V, and all other users to have 

Thus, the problem of estimating topical interest at all points some attributes of their own stored in the system: far 

becomes a problem of interpolating among these estimates 20 example, age (mimeric). social security number (textual), 

of topical interest at selected points, such as the feedback and list of documents previously retrieved (associative). It is 

estimate of f (V, Y)as r(V, Y>q(V; Y\ This interpolation can these attributes that determine the notion of "similar users .** 

be accompiished with any standard sn&othing technique* Thus itjs desirable to generate profiles of users (term ed 

using as input the known point estimates of the value of the "user profiles") as well as gr fjje^ nftax£M-**y**A (tmnnd 

topical interest function f(*, *), and detennining as output a 25 M target profiles"). Some attributes employed for profiling 

function that approximates the entire topical interest func- users may be related to the attributes employed for profiling 

tion f(*. *). target objects: for example, using associative attributes, it is 

Not all point estimates of the topical interest function f(*. possible to characterize target objects such as X by the 

*) should be given equal weight as inputs to the smoothing interest that various users have shown in them, and simul- 

algorithm. Since passive relevance feedback is less reliable so taneously to characterize users such as U by the interest mat 

than active relevance feedback, point estimates made from they have shown in various target objects. Id addition, user^ 

passive relevance feedback should be weighted less heavily profiles may make use of any attributes that are useful in 

than point estimates made tram active relevance feedback. characterising humans, such as those suggested in the 

or even not used at all. In most domains* a user's interests example domain above where target objects are potential 

may change over time and. therefore, estimates of topical as consumers. No tice that user ITs interest T an br #>g4 ' rT,ntw1 

interest that derive from more recent feedback should also even ifus er U is a new user or an off-line user who has ne ver 

be weighted more heavily. A user's interests may vary pr ovided any. feedback, because thr, relevance feedback of 

according to mood, so estimates of topical interest that ttsers^whose-attributcs ate simila r fa TPs at*"*** 1 *** is taken 

derive from the current session should be weighted more in^ accou nt 

heavily for the duration of the current session, and past 40 For some uses of filtering systems, when estimating 

**tiTnnt*& of topical interest made at approximately the topical interest, it is appropriate to make an additional 

current time of day or on the current weekday should be "presumption of no topical interest" (or "bias toward zero'*), 

weighted more heavify. Finally, in domains where users are To understand the usefulness of such a presumption, suppose 

trying to locate target objects of long-term interest the system needs to determine whether target object X is 

(investments, romantic partners, pen pals, employers. 45 topically interesting to the user U. but that users like user U 

employees, suppliers, service providers) from the possibly have never provided feedback on target objects even 

meager information provided by the target profiles, the users remotely like target object X. The presumption of no topical 

are usually not in a position to provide reliable immediate interest says that if this is so, it is because users like user U 

feedback on a target object, but can provide reliable feed- are simply not interested in such target objects and therefore 

back at a later date An estimate of topical interest f(V, Y) 50 do not seek them out and interact with them. On this 

should be weighted more heavily if user V has had more presumption, the system should estimate topical interest f(U, 

experience with target object Y. Indeed, a useful strategy is X) to be low. Formally, this example has the characteristic 

far the system to track long-term feedback for such target that (U. X) is far away from all the points (V, Y) where 

objects. Far example, if target profile Y was created in 1990 feedback is available. In such a case, topical interest f(U.X) 

to describe a particular investment that was available in S3 is presumed to be dose to zero, even if the value of the 

1990. and that was purchased in 1990 by user V, then the topical interest function ft*, *) is high at all the faraway 

system solicits relevance feedback from user V in the years surrounding points at which its value is known, When a 

1990, 1991. 199Z 1993. 1994, 1995, etc., and treats these as smoothing technique is used, such a presumption of no 

successively stronger indications of user Vs true interest in topical interest can be introduced, if appropriate, by manipu- 

target profile Y. and thus as indications of user Vs likely 60 lating t be input to the smoothing technique. In addition to 

interest in new investments whose current profiles resemble using observed values of the topical interest function f[*, •) 

the original 1990 investment profile Y In particular, if in as input, the trick is to also introduce fake observations of 

1994 and 1995 user V is welMisposed toward his or her the farm topical interest f(V, Y)=0 for a lattice of points (V. 

1990 purchase of the investment described by target profile Y) distributed throughout the multidimensional space. These 

Y, then in those years and later, the system tends to recom- 63 fake observations should be given relatively low weight as 

mend additional mvestments when they have profiles like inputs to the smoothing algorithm. The more strongly they 

target profile Y, on the grounds that they too will turn out to are weighted, the stronger the presumption of no interest 
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The following provides another simple example of an of attribute weights it currently stores for user U: (I) For 

estimation technique that has a presumption of no interest each l<=Io»n. use the estimation techniques to estimate 

Let g be a decreasing function from non-negative real q(U. X^HfflJ, XJ from all known values of feedback ratings 

numbers to non-negative real numbers, such as gfx)=e* or r. Call this estimate a,, (ii) Repeal step (J), but mis rime make 

gOO=min(l, x**) where k>l. Estimate topical interest f(U, 5 the estimate for each lo=io=n without using the feedback 

X) with the following g-weighted average: ratings r(U, X,) as input for any j such that the distance 

d(Xp, 3C) is smaller than a fixed threshold. That is, estimate 

rf/ *vr\ ^ftwAte-wMf^/vni each q(U, X,)+f(U, XA from other values of feedback rating 

AUJQ- mV ^i^^^^^ r onl£ in partculaTdo not use *U, >Q itself . Call to 

estimate b^ The difference a^b, is herein termed the "resi- 

. u „ . A;VN ^ ^ due feedback r^U, XJ of user U on target object X^" (iii) 

Here mc summations are over all pau^^s** tat ^ ^ measure, (a^^-b^W 

user V has provided feedback r(V, Y) on target object Y. . S£ -Kaw-bJ 2 

all pairs (V, Y) such that relevance feedback r(V, Y) is ^ A ^r^ent-descent or other numerical cjprimization 

defined. Note that both with this technique and with con- method may be used to adjust user ITs attribute weights so 

ventional smoothing techniques, the estimate of the topical 15 that this error measure reaches a (local) minimum This 

interest f(U. X) is not necessarily equal to r(U, X)-q(U, X approach tends to work best if the smoothing technique used 

), even when r(U, X) is defined. in estimation is such that the value of f(V, Y) is strongly 

Rltering: Adjusting Weights and Residue Feedback affected by the point estimate r(V, Y)~q(V. Y) when the latter 

The method described above requires the filtering system value is provided as input Otherwise, the presence or 

to measure distances between (user, target object) pairs, such 20 absence of the single input feedback rating r(U, X.). in steps 

as the distance between (U, X) and (V Y). Given the means (i)-(ii) may not make a, and b, very different from each 

described earlier for measuring the distanm be t w een two other. A slight variation of mis learning technique adjusts a 

niulti-attribute profiles, the method must therefore associate single global set of at tribute weights for all users, by 

a weight with each attribute used in the profile of (user. adjusting the weights so as to minimize not a particular 

target object) pairs, that is, wim each attribute used to profile 23 user's error measure but rather the total error measure of all 

either users or target objects. These weights specify the users* These global weights are used as a default initial 

relative importance of the attributes in establishing similar- setting for a new user who has not yet provided any 

itv or difference, and therefore, in detenmning how topical feedback. Gradient descent can then be employed to adjust 

interest is generalized from oiie fuser, Jarget.c^jectjjgir to this user's individual weights over time. Even when the 

another. Additional wei^ts determine which attributes ofa 30 attribute weights are chosen to mirrirrrize the error measure 

target object contribute jo the quality function q, and by how for user U. the error measure is generally still positive, 

much. ' meaning that residue feedback from user U has not been 

It is possible and often desirable for a filtering system to reduced to 0 on all target objects. It is useful to note that high 

sto re a different set of weights for eaduiser- For example. residue feedbackfrom a user U on a target object X indicates 

a userjyho thinks of two-star films as having materially 35 that user U Hkwl target object X unexpectedly well given its 

dj^er enttopdc and style from four-star films wantstojsgjgn profile, that is. better than the smoothing model could 

a hi gh weight to M number of stars' 1 rcr_purpQ^_of the predict from user ITs opinions on target objects with similar 

sim ilarity dista nce measure d(*, *); this means that interest profiles. Similarly, low residue feedback indicates that user 

in a two-star film does not necessarily signal interest in an Tj Hked target objectX Less than was expect By definition, 

otherwise similar four-star film, or vice-versa. If the user 40 this unexplained preference or dispreference cannot be the 

also agrees with the critics, and actually prefers four-star result of topical similarity, and therefore nuist be regarded as 

films, the user also wants to assign "number of stars** a high an indication of the intrinsic quality of target object X. It 

positive weight ia the determination of the quality function follows that a useful quality attribute for a target object X is 

q. In me same way, a user who dislikes vulgarity wants to the average amoojit of residue feedback r^/V. X) from users 

assign the "vulgarity score** attribute a high negative weight 45 on that target object, averaged over all users V who have 

in the determination of the quality function q, although the provided relevance feedback on the target object. In a 

"Vulgarity score** attribute does not necessarily have a high variation of this idea, residue feedback is never averaged 

weight In determining the topical sirnilarfty of two films. mdiscriminately over all users to form a new attribute, but 

Attribute weights (of both sorts) may be set or adjusted by instead is smoothed to consider users* similarity to each 
the system administrator or the individual user, on either a so other. Recall that the quality measure q(U. X) depends on the 
temporary or a permanent basis. However, it is often desk- user U as well as the target object X so that a given target 
able for the filtering system to learn attribute weights object X may be perceived by different users to have 
automatically, based on relevance feedback. The optimal different quality. In this variation, as before, q(U, X) is 
attribute weights for a user U are those mat allow the most cftlciilatfri as a weighted sum of various quality attributes 
accurate prediction of user ITs interests. That is, with the ss that are dependent only on X, but men an additional term is 
distance measure and quality function defined by these added, namely an estimate of r w (U, X) found by applying 
attribute weights, user ITs interest In target object X, q(U, a smoothing algorithm to known values of r^ (V, X). Here 
x>+f(U, X), can be accurately estimated by the techniques V ranges over all users who have provided relevance feed- 
above. The effectiveness of a particular set of attribute back on target object X, and the smoothing algorithm is 
weights for user U can therefore be gauged by seeing how 60 sensitive to the distances d(U, V) from each such user V to 
well it predicts user U's known interests. user U. 

Formally, suppose that user U has previously provided Using the Similarity Computation for Clustering 

feedback on target objects X if Xj, X 3 » . . . X„, and that the (07 A method for defining the distance between any pair of 

feedback ratings are r(U, X,), r(U, X^), r(U, X3), . . . i(U, target objects was disclosed above. Given this distance 

XJ. Values of feedback ratings r(* *) for other users and 65 measure, it is simple to apply a standard clustering 

other target objects may also be known. The system may use algorithm, such as k-means, t o group the target objects into 

the following procedure to gauge the effectiveness of the set a number of clusters, in such a way mat similar target objects 
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computed by the general similarity-measurement meth- 
ods described earlier. 
4) Sequential hybrid method. First appiy the k-means 
procedure to do Lz, so u^tiar^les«ire*iabeledlb# 
dusteHbas edirop^ then use super- 

vised clustering (maximum likelihood discriminant 
methods) using the word frequencies to do the process 
of method 2a described above. This tries to use knowl- 
edge of who read what to do a better job of clustering 
based on word frequencies. One could similarly com- 



te nd to be grouped in fre «me duster It is dear that the 
resulting dusters can be used to improve the efficiency of 
mating buyers and sellers in the application described in 
section "Matching Buyers and Sdlers w above: it is not 
necessary to compare every buy profile to every sell profile, 
but only to compare buy profiles and sell profiles that are 
similar enough to appear in the same cluster. As explained 



below, the results of the clustering procedure can also be 
used to make filtering more efficient, and in the service of 
querying and browsing tasks. 10 
The k-means clustering method is fanriKar to those skilled , bine the methods lb and 2b described above. 
\ in the art. Briefly put it finds a grouping of points (target \Y> Hierarchical dustering of target objeetsisoften useful " 
r*ofiles, in this case, whose numeric coordinates are given Hierarchical chreteriHg«|»orincc5iai^^ 
by numeric decoinposition of their attributes as described tajgefcobjc€&j&2 I in^^ 
above) to minimiMtoc distance between points in the „ objects; eacfeictahescicin^ 
dusters and the centers of the clusters in whkh they are ^^J?* 0 *^ 

point to the dustt* wW^ has the nearest center and then, Sfe^I^ . 
occe the points have been assigned, comptop the (new) de^tes al*rticular 'target object d, or equivalent*, a single- 
center of each cluster by averaging the coordinates of the „ member duster consisting of this target object Target object 
points (target profiles) located in this duster. Other cluster- d is a member of the duster (a. b, d), which is a subset of 
ing niethods can be used, such as "soft" or "fuzzy" k-means mc duster (a. b, a d, e. f). which in turn is a subset of all 
dustering, in which objects arc allowed to belong to more target objects. The tree shown in FIG. 8 would be produced 
than one duster This can be cast as a dustering problem fro m a set of target objects such as those shown, geometri- 
simflar to the k-means problem, but now the criterion being 25 cally in FIG. 7. In FIG." 7, each letter represents a target 
optimized is a little different: object and axes xl and x2 represent two of the many 

numeric attributes on which the target objects differ. Such a 
ii^^) cluster tree may be createdb y hand, using human judgme nt 

to form dusters andlubdusters of similar objects, or may be 
30 nratffrf »iTtr»TTiatif»ii v in athc rof two s tandard wa ys: top- 
where C ranges over cluster numbers, i ranges ove r targ et dWn or 6ottom-iip. to top-do^ra hierarchical dustering, the 
objects, x, is the numeric vector corresponding to the profile sct of ^ target objects in FIG. 7 would be divided into the 
<^Jm^Ai^JiS^c^Js the n^.cf^.^ejuimeric dusters (a, b. c. d, e, f) and (g, h< i j. t). The clustering 
v ectors cor rejpon^ngjo^tajget^ro^ algorithm would then be reapplied to the target objects in 

dttster_nu mber C te rmed the "duster profile" of cluster C. 35 each cluster, so that the duster (g. h, L j. k) is subpartitioned 
d(* *) is the metric used to measure distance between two mto mc clusters (g, k) and (h, i j> and so on to arrive at the 
target profiles, and i^ is a value between 0 and 1 that ^ m piG. 8. In bottom-up hierarchical dustering, 

indicates how much target object number i is associated wife the set of all target objects in rlGWwo^lKIpOup^unta 
duster number C, where i is an indicator matrix with the n umerogsismalltdPS te^S ^ (k 
property m at for each i, S1UM SUB C I SUB iOL For 40 ^ ^ j Th csc dusters wowTtfaen themselves be grouped 
k-means clustering, is either 0 or 1. into the larger clusters (a, b, d), (c. e, f), (g* k). and (h, i, jX 

Any of these basic types of dustering might be used by according to their cluster profiles. These larger clusters 
the system: would themselves be grouped into (a, b, c d, e, f) and (g, k, 

1) Association-based dustering, in which profiles contain n , 1 j), and so 00 until all target objects had been grouped 
only associative attributes, and thus distance is defined 43 together, resulting in the tree of FIG. & Note mat for 
entirely by associations. This kind of dustering gener- bottom-up clustering to work, it must be possible to apply 
ally (a) clusters target objects based on the similarity of the dustering algorithm to a sct of existing clusters. This 
the users who like them or (b) dusters users bascdon requires a notion of the distance between two clusters. The 
the similarjtY^f-the-targrt.objo method disdosed above for measuring the distance between 
approach, the system does not need any information 50 target objects can be applied directly, provided that dusters 
about target objects or users, except for th eir histor y of ^ profiled in the same way as target objects. It is only 
int eraction with each other. necessary to adopt the convention that a duster* s profile is 

2) Content-based clustering. in__whidi_ profiles conta in the average of the target profiles of all the target objects in 
only rjoDrasspdative attributes . This kind of clustering the cluster; that is, to determine the duster's value for a 
(a) dusters target objects based on the similarity of 55 given attribute, take mc mean value of mat attribute across 
their non-associative attributes (such as word all the target objects in the cluster. For the mean value to be 
frequendes) or (b) clusters users base d on the siml- well-defined, all attributes must be numeric, so it is neces- 
lariry of their non-associative attributes (such as demo- sary as usual to replace each textual or associative attribute 
graphics and pictographies), with its decomposition into numeric attributes (scores), as 

In this approach, the system does not need to record any so described earlier. For example, the target profile of a single 
infonnatjotL^bouLu^ers* historical patterns of information Woody Allen film would assign 'Woody-Allen** a score of 1 
ac cess, but it does need information about the intrinsic in the ^name-of^dhxetor" fidd, while giving "Federico- 
properties of users and/or target objects. FeUinT and 'Tcrence-Dayies" scores of a A duster that 

3) Uniform hybrid method, in which profiles may contain consisted of 20 films directed by Allen and 5 directed by 
both associative and non-associative attributes. This 65 Fellini would be profiled with scores of 0.8. 02, and 0 
method combines la and 2a, or 1^ and 2b. The distance respectivdy, because* for example, 0.8 is the average of 20 
d(P X i Py) between two profiles P z and P r may be ones and S zeros. 
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Searching for Target Objects trained to take the attributes of a target object as input, and 

Given a target object with target profile P, or alternatively produce as output a unique pattern that can be used to 

given a search profile P, a hierarchical cluster tree of target identify the appropriate low-levd duster. For maximum 

objects makes it possible for the system to search efficiently accuracy, low-level clusters that are similar to each other 

for target objects withjarget profiles similar to P. It is only 5 (close together in the cluster tree) should be given similar 

necessarily to navigate" through the tree, automatically, in identifying patterns. Another approach is a standard decision 

search of such target profiles. The system for customized tree that considers the attributes of target profile P one at a 

electronic identification of desirable objects begins by con- time until it can identify the appropriate cluster. If profiles 

sidering the largest top-level clusters, and selects the cluster are large, this may be more rapid man considering all 

whose profile is most similar to target profile P. In the event to attributes. A hybrid approach to searching uses rlfotamy 

of a near-tie, multiple clusters may be selected. Next, the measurements as described above to navigate through the 

system considers all subcliisters of the selected clusters, and top few levels of the hierarchical cluster tree, until it reaches 

this Kmr» grfp^tg the subchisters or snbclusters whose profiles an cluster of intermediate size whose profile is similar to 

are closest to target profile P. This refinement process is target profile P. and men continues by using a decision tree 

iterated until the dusters selected on a given step are 15 specialized to search for low-level subdusters of that inter- 

sufflderitiy small, and these arc the desired dusters of target mediate duster. 

objects with profiles most similar to target profile P. Any One use of these searching techniques is to search for 

hierarchical duster tree merefbre serves as a decision tree target objects that match a search profile from a user's search 

for identifying target objects. In pseudo-code form, this profile set This form of searching is used repeatedly in the 

process is as follows (and in flow diagram form in FIGS. 20 news cupping service, active navigation, and virtual Com- 

13A and 13B): munity Service applications, described below. Another use is 
i Tnfriftii w list nf kt^nttfiftrt tarp* nhj«*x to th* wnpty list 1 tyta add a new target object quickly to the cluster tree. An 



at step 13AM 

. Initialize the current tree T to be the hierarchical cluster 
tree of all objects at step 13AI1 and at step 13A#2 scan 
the current duster tree for target objects similar to P, 
using the process detailed in FIG. 13B. At step 13A03, 
the list of target objects is returned. 

. At step 13B90, the variable I is set to 1 and for each 
child subtree Ti of the root of tree T. is retrieved. 



**i*ting cluster that is similar to the new target object can be 
located rapidly, and the new target object can be added to 
23 this duster; If the object is beyond a certain threshold 
distance from the cluster center, then it is advisable to start 
a new duster. Several variants of this incremental clustering 
scheme can be used* and can be built using variants of 
subroutines available in advanced statistical packages. Note 
30 that various methods can be used to locate the new target 
. -4\ objects that must be added to the duster tree, depending on 

4. At step 13B0Z calculate d^p*), the similarity distance ^ me architecture used. In one method, a "webcrawler" pro- 
between P and p h gram running on a central computer periodically scans all 

5. At step 13B03, if d(P, Pi ><t, a threshold, beach to one in search of new target objects, calculates the target 
of two options 35 profiles of these objects, and adds them to the hierarchical 

6. If tree Ti contains only one target object at step 13B04, cluster tree by the above method. In another, whenever a 
add that target object to list of identified target objects new target object is added to any of the servers, a software 
at step 13BI5 and advance to step 13B#7. "agent" at that server calculates the target profile and adds 

7. If tree Ti contains multiple target objects at step 15BS4, it to the hierarchical cluster tree by the above method, 
scan the ith child subtree for target objects similar to P 40 Rapid Profiling 

by invoking the steps of the process of FIG. 13B . %^ In some domains, rrtmplftc profiles of target objects are 
recursively and then recurse to step 3 (step 13A#1 in 'not always easy to ccrajTuc*_autom*rir*1ly. When target 
FIG. 13A) with T bound for the duration of the recur- objects are wallpaper patterns, for example, an attribute such 
sioo to tree TL in order to search in tree Ti for target as "genre" (a single textual term such as u Art-Deco " 
objects with profiles similar to P. 45 "Children's " "Rustic," etc.) may be a matter of judgment 

In step 5 of this pseudo-code, smaller thresholds are and opinion, difficult to dctennine except by consulting a 
typically used a t lower levels of the tree, for example by human, More signrficantry, if each wallpaper pattern has an 
making the threshold an affine function or other function of associative attribute that records the positive or negative 
the cluster variance or cluster diameter of the cluster p r If relevance feedback to that pattern from various human users 
the cluster tree is distributed across a plurality of servers, as 50 (consumers), then all the association scores of any newly, 
described in the section of this description titled "Network introduced pattern are in itially zer o, so that it is initially 
Context of the Browsing System", mis process may be un dear what other patterns are similar J o_the new pattern 
executed in distributed fashion as follows: steps 3-7 are wjj frrcspect to the users who like them. Indeed, if this 
executed by the server that stores the root node of hierar- associative attribute is highly weighted, the initial lack of 
chical cluster tree T, and the recursion in step 7 to a 55 relevance feedback information may be difficult to remedy, 
subdusters tree T ( involves the transinission of a search due to a vicious drde in which users of moderate-to-high 
request to the server that stores the root node of tree T„ in terest are needed to provide relevance feedb ack but rel- 
which server carries out the recursive step upon receipt of evance feedback is needed to identify users of moderate-to- 
this request Steps 1-2 are carried out by the processor that high interest Fortunately, however, it is often possible in 
initiates the search, and the server executes step 6 must 60 principle to deterrmne certain attributes of a new target 
send a message identifying the target object to this initiating object by extraordinary methods, induding but not limited 
processor, which adds it to the list to methods that consult a human. For example, the system 

. Assuming mat low4evd clusters have been already been can in principle determine the genre of a wallpaper pattern 
1 ' formed through clustering, there are alternative search meth- by consulting one or more randomly chosen individuals 
ods for identifying the low4evel duster whose profile is 65 from a set of known human experts, while to determine the 
mngr^jipnjtT m a piv ro target profile P. A standard back - numeric association score between a new wallpaper pattern 
propagation neural net is one such method: it should be and a particular user, it can in principle show the pattern to 
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the that user and obtain relevance feedbac k Since such the user logs on to the system, a subset of the user's 

requests inconvenience peaplertoSwever, it is important not short-tenn attributes are additionally oVtrrminfrf, through 

to determine all difficult attributes this way, but only the ones the use of a separate rapid profiling tree mat asks about 

that are most important for purposes of c lassify ing the short-term attributes, 

document. "Rapid profiling" is a method for selecting those 5 Market Research 

numeric attributes that are most important to determine. A technique similar to rapid profiling is of interest in 

(Recall that all attributes can be decomposed into numeric market research (or voter research). Suppose that the target 

attributes, such as association scores or term scores.) First objects are consumers. A particular attribute in each target 

a set of existing target objects that already have complete or profile indicates whether the consumer described by that 

largely complete profiles are clustered using a k-means to target profile has purchased product X. A decision tree can 

algorithm. Next each o f the resuhing^cluste rs-iS'*««£"«*-« ~| be built that attempts to determine what value a consumer 

■imV|ii* iHrnflfyinfl n umber and each clustered tar get object I has for this attribute, by consideration of the other attributes 
is labeled with the ide ntifyin g number of its cluster . Standard I in the consumer's profile. This decision tree may be tra- 

meuods~men allow construction of a single decision tree versed to determine whether additional users are likely to 

that can determine any target object's cluster number, with 15 purchase product X. More generally, the top few levels of 

substantial accuracy, by considering the attributes of the the decision tree provide informatkHL valuable to advertisers 

target object one at a time. Only attributes that can if who are planning mass-market or direct-man campaigns, 

necessary be determined far any new target object are used about the most significant characteristics of consumers of 

in the construction of mis decision tree, lb profile a new product X. 

target object the decision tree is traversed downward from 20 Similar information can alternatively be extracted from a 
its root as far as is desired. The root of the decision tree collection of consumer profiles without recourse to a dec i - 
considers some attribute of the target object If the value of sion tree, by considering attributes one at a time, and 
this attribute is not yet known, it is determined by a method identifying those attributes on which pro duct X*s consum- 
ap pio piiate to that attribute; for example, if the attribute is ers differ significantly from its non-consumers. These tech- 
the association score of the target object with user #4589, 23 niques serve to characterize consumers of a particular prod- 
then relevance feedback (to be used as the value of this uct; they can be equally well applied to voter research or 
attribute) is solicited from user #4589. perhaps by the ruse other survey research, where the objective is to characterize 
of adding the possibly uninteresting target object to a set of those individuals from a given set of surveyed individuals 
objects that the system recommends to the user's attention, who favor a particular candidate, hold a particular opinion, 
in order to find out what the user thinks of it Once the root 30 belong to a particular demographic group, or have some 
attribute is determined, the rapid profiling method descends other set of distinguishing attributes. Researchers may wish 
the decision tree by one level, choosing one of me decision to purchase batches of analyzed or unanalyzed user profiles 
subtrees of the root in accordance with the determined value from which personal identifying information has been 
of the root attribute. The root of this chosen subtree con- removed. As with any statistical database, statistical conclu- 
siders another attribute of the target object whose value is 35 sions can be drawn, and relationships between attributes can 
likewise determined by an appropriate method. The process be elucidated using knowledge discovery techniques which 
c an be repeated to detexmine as many attributes as desired, are well known in the art 

burden of determining too many attributes. 40 The following section describes the p referr ed computer 

It should be noted that the rapid profiling method can be and network architecture for implementing the methods 

used to identify important attributes in any sort of profile, described in this patent 

and not just profiles of target objects. In particular, recall that Electronic Media System Architecture 

the disclosed method for determining topical interest FIG. 1 illustrates In block diagram form the overall 

through similarity requires users as well as target objects to 43 architecture of an electronic media system, known in the art, 

have profiles. New users, like new target objects, may be in which the system for customized electronic identification 

profiled or partially profiled through the rapid profiling of desirable objects of the present invention can be used to 

process. For example, when user profiles include an asso- provide user customized access to target objects that are 

dative attribute that records the user's relevance feedback available via the electronic media system. In particular, the 

on a 11 target objects in the system, the rapid profiling so electronic media system comprises a data communicatloo 

procedure can rapidly form a rough characterization of a facility that interconnects a plurality of users with a number 

new user's interests by soliciting the user's feedback on a of information servers. The users a re typically mdrviduals. 

small number of significant target objects, and perhaps also whose personal computers (terminals) T^T. are connected 

by determining t small n umber of other key attributes of the via a data communications link, such as a modem and a 

new user, by on-line queries, telephone surveys, or other 53 telephone connection established in well-known fashion, to 

means. Once the new user has been partially profiled in this . a tclecomrniinirrrron network N. User information access 

way. the methods disclosed above predict that the new user's software is resident on the user's personal computer and 

interests resemble the known interests of other users with serves to communicate over the data communications link 

similar profiles. Id a variation, each user's user profile is and the telecommunication network N with one of the 

subdivided into a set of long-term attributes, such as demo- 60 plurality of network vendors Vj-V^ (America Online, 

graphic characteristics, and a set of short-term attributes that Prodigy, CompuServe, other private companies or even 

help to identify the user's temporary desires and emotional universities) who provide data interconnection service with 

state, such as the user's textual or nmltipie<hoice answers selected ones of the information servers 1,-Ip,. The user can, 

to questions whose answers reflect the user's mood. A subset by use of the user information access software, interact with 

of the user's long-term attributes are determined when the 65 the information servers I t - 1^ to request and obtain access to 

user first registers with the system, through the use of a rapid data that resides on mass storage systems -SS m that are part 

profiling tree of long-term attributes. In addition, each time of the information server apparams. New data is input to this 
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system y users via their personal computers T^T,, and by modules and their functions arc possible and the examples 

commercial information services by populating their mass provided herein represent illustrative examples and are not 
storage systems SSi-SS m with commercial data. Bach user intended to limit the scope of the claimed invention. For me 

terminal T^T. and the information servers Ii^L have purposes of pseudonymous creation and update of users' 

phone numbers or IP addresses on the network N which 5 target profile interest summaries (as described below), the 

enable a data communication link to be established between vendors V,-V A may be augmented with some number of 

a particular user terminal T^T,, and the selected information proxy servers, which provide a mechanism for ongoing 

server I^-I^ A user's electronic mail address also uniquely pseodonymous access and profile building through the 

identifies the user and the user's network vendor V 4 -V t in method described herein. At least one trusted validation 

an industry-standard format such as: username@aol.com or 10 server must be in place to administer the creation of pseud- 

username@netcom.com. The network vendors Vj-V t pro- onyms in the system. 

vide access passwords for their subscribers (selected users). An important characteristic of this system for customized 

through which the users can access the information servers electronic identification of desirable objects is its 

I l -l m . The subscribers pay the network vendors Vj-V* for responsiveness, since the intended use of the system is in an 

the access services on a fee schedule that typically includes 15 interactive mode. The system utility grows with the number 
a monthly subscription fee and usage based charges. A of the users and this increases the number of possible 

difficulty with this system is that there are numerous infor- consumer/product relationships between users and target 
mauon servers lx-K* located around the world, each of objects. A system that serves a large group of users must 

which provides access to a set of information of jtfffering maintain interactive performance and the disclosed method 

format content and topics and via a cataloging system that 20 for profiling and clustering target objects and users can in 

is typically unique to th e particular information server Ix-I^- turn be used for optimizing the distribution of data among 
ThcKnfffip***' ^■fcr^irip^^l^iraiCTHT^li^i irtfttlCT iEKtal- the members of a virtual community and through a data 

canl eontafa t audjoj^ta^ communicatioDS network, based on users' target profile 

ctn^ff^rHAta H^^ da^T^ IX interest summaries, 

terminology of this patent each target object is associated 23 Network Elements and System Characteristics 
with a unique file: for target objects that are informational in The various processors interconnected by the data corn- 
nature and can be digitally represented, the file directly rnunication network N as shown in FIG. 1 can be divided 
stores the informational content of the target object, while int o two classes and group ed as fflustrated in FIG. 2: clients 
for target objects that are not stored electronically, such as and r servers. The clients Cl^GTire iiwh vtaual user's com- 
purchasable goods, the file contains an identifying descrip- 30 puter systems which are connected to servers S1-S5 at 
tx>n of the target object Target objects stored electronically various times via data communications links. Each of the 
as text files can include commercially provided news clients Q is typically associated with a single server Sj. but 
articles, published documents, letters, user-generated with users can change over time. The clients Cl-Cn both 
documents, descriptions of physical objects, or combina- interface with users and produce and retrieve files to and 
uons of these classes of data. The organization of the files 35 from servers. The clients Cl-Cn are not necessarily con- 
containing the information and the native format of the data tinuously on-line, since they typically serve a single user and 
contained in files of the same conceptual type may vary by can be movable systems, such as laptop computers, which 
information server I^-I^ can be connected to the data communications network N at 
Thus, a user can have difficulty in locating files that any of a number of locations. Clients could also be a variety 
contain me desired information, because the information 40 of other computers, such as computers and kiosks providing 
may be contained in files whose information server catalog- access to customized information as well as targeted adver- 
ing may not enable the user to locate them. Furthermore, tising to many users, where the users identify themselves 
there is no standard catalog that defines die presence and with passwords or with smart cards. A server Si is a 
services provided by all inf carnation servers Ii-I^ A user computer system that is presumed to be continuously on-line 
therefore does not have simple access to information but 45 and functions to both collect files from various sources on 
must expend a significant amount of time and energy to the data communication network N for access by local 
excerpt a segment of the information that may be relevant to clients Cl-Cn and collect files from local clients Cl-Cn for 
the user from the plethora of information that is generated access by remote clients. The server Si is equipped with 
and populated on this system. Even if the user commits the persistent storage, such as a magnetic disk data storage 
necessary resources to this task, existing information so medium, and are interconnected with other servers via data 
retrieval processes lack the accuracy and efficiency to ensure communications links. The data c*mmimic*rtons links can 
that the user obtains the desired information, ft is obvious be of arbitrary topology and architecture, and are described 
mat within the constructs of mis dectronic media system, herein for the purpose of simplicity as point-to-point links 
the three modules of the system for customized electronic or. more precisely, as virtual point-to-point links. The serv- 
idrntification of desirable objects can be implemented in a S3 ers S1-S5 comprise the network vendors Vl-Vk as well as 
distributed manner, even with various modules being impie- the information servers ^-I^ of FIG. 1 and the functions 
mealed on and/or by different vendors within the electronic performed by these two classes of modules can be merged 
media system. For example, the information servers Ii^, m a greater or lesser extent in a single server Si or distributed 
can include the target profile generation module while the over a number of servers in medatacomnuimcation netwo rk 
network vendors V t -V A may implement the user profile 60 N. Prior to proceeding with the description of the preferred 

generation module, the target profile interest summary gen- *Tn>wtinvmt c£ the inventing a numherof tarns are defined 

eration module, and/or the profile processing module. A FIG. 3 illustrates in block diagram form a representation of 

module can itself be implemented in a distributed manner, an arbitrarily selected network topology far a plurality of 

with numerous nodes being present in the network N, each servers A-O, each of which is interconnected to at least one 

node serving a population of users in a particular geographic 63 other server and typically also to a plurality of clients p-s. 

area. The totality of these nodes comprises the functionality Servers A— D are interconnected by a collection of point to 

of the particular module. Various other partitions of the point data communications links, and server A is connected 
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to client r, server B is connected to clients p-q, while server 
D is connected to client s. Servers transmit encrypted or 
unencrypted messages amongst themselves: a message typi- 
cally contains the textual and/or graphic information stored 
in a particular file and also contains data which describe the 
type and origin of this file, the name of the server that is 
supposed to receive the message, and the purpose for which 
the file contents are being transmitted. Some messages are 
not associated with any file, but are sent by one server to 
other servers for control reasons, for example to request 
transmission of a file or to announce the availability of a new 
file. Messages can be forwarded by a server to another 
server, as in the case where server A transmits a message to 
server D via a relay node of either server C or servers B. C 
It is generally preferable to have multiple paths through the 
network, with each path being fharartrrfred by its perfor- 
mance capability and cost to enable the network N to 
optimize traffic routing. 

Proxy Servers and Pseudonymous Transactions 

While the method of using target profile interest summa- 
ries presents many advantages to both target object provid- 
ers and users, there are important privacy issues for both 
users and providers that must be resolved if the system is to 
be used freely and without inhibition by users without fear 
of invasion of privacy. H is likely that user s desire that some, 
if not alL of the user-specific information in their user 
profiles and target profile interest summaries remain 
confidential, to be disclosed only under certain circum- 
stances related to certain types of transactions and according 
to their personal wishes for differing levels of confidentiality 
regarding their purchases and expressed interests. 

However, complete privacy and inaccessibility of user 
transactions and profile summary information would hinder 
implementation of the system for nmnmi?**! electronic 
identification of desirable objects and would deprive the user 
of many of the ad vantages derived through the system* s u se 
of user-specific information. In many cases* complete and 
total privacy is not desired by all parties to a transaction. For 
example, a buyer may desire to be targeted for certain 
mailings that describe products that are related to his or her 40 
interests, and a seller may desire to target users who are 
predicted to be interested in the goods and services that the 
seller provides. Indeed, the usefulness of the technology 
described herein is contingent upon the ability of the system 
to collect and compare data about many users and many 
target objects, A compromise between total user anonymity 
and total public disclosure of the user's search profiles or 
target profile interest summary is a pseudonym. A pseud- 
onym is an artifact that allows a service provider to com- 
municate with users and buOd and am miniate records of 
their preferences over time, while at the same time remain- 
ing ignorant of the users* true identities, so mat users can 
keep their purchases or preferences private. A second and 
equally important requirement of a pseudonym system is 



funds (a credential) from the bank, while stul not disclosing 
the user's true identity to the service provider. 

Our method solves the above problems by combining the 
pseudonym granting and credential transfer methods taught 
by D. Chaum and J. H. Bvertse. in the paper titled *'A secure 
and privacy-protecting protocol for transmitting personal 
information between organizations,** with the implrmenta- 
tion of a set of one or marc proxy servers distributed 
throughout the network N. Each proxy server, for example 
S2 in FIG. 2, is a server which communicates with clients 
and other servers S5 in (he network either directly or through 
anonymizing mix paths as detailed in the paper by D. Q**"*" 
titled ^Untraceable Electronic Mail. Return Addresses, and 
Digital Pseudonyms,** published in Communications of the 
ACM, Volume 24, Number 2. February 1981. Any server in 
15 the network N may be configured to act as a proxy server in 
addition to its omcr functions. Each proxy server provides 
service to a set of users, which set is termed the "user base'* 
of that proxy server. A given proxy server provides three 
sorts of service to each user U in its user base, as follows: 



30 



20 1. The first function of the proxy server is to bidirectioo- 
ally transfer cnnmtiinintrions between user U and other 
entities such as information servers (possibly including 
the proxy server itself) and/or other users. Specifically, 
letting S denote the server that is directly associated 
25 with user U's client processor, the proxy server com- 
municates with server S (and thence with user U), 
either through anonymizmg mix paths that obscure the 
identity of server S and user U, in which case the proxy 
server knows user U only through a secure pseudonym, 
or else through a conventional virtual point-to-point 
connection, in which case the proxy server knows user 
U by user IPs address at server S. which address may 
be regarded as a non-secure pseudonym for user U. 
flfj2. A second function of the proxy server is to recor d^ 

uScTTspec^c'*iiifar n^ 

tar^tprofileTinter&f sum as wSl!2P& 

li&To^accwsicon^ 

described below, and a set of one-time return addresses 
provided by user U that can be used to send messages 
to user U without knowing user ITs true identity, ^hqfip 

or^So^secS^onffi^^r^^^fe? 
3. A third function of the proxy server is to act as a 
selective forwarding agent for unsolicited communica- 
tions that are addressed to user U: the proxy server 
forwards some such comma nications to user U and 
rejects others, in accordance with the access control 
instructions specified by user U. 
Our combined method allows a given user to use either a 
single pseudonym in all transactions where he or she wishes 
to remain pseudonymous, or else different pseudonyms for 
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different types of transactions. In the latter case, each service 
that it provide for digital credentials, which are used to 55 provider might transact with the user under a different 



guarantee that the user represented by a particular pseud- 
onym has certain properties. These credentials may be 
granted on the basis of result of activities and transactions 
conducted by means of the system for customized electronic 
identification of desirable objects, or on the basis of other 
activities and transactions conducted on the network N of 
the present system, on the basis of users* activities outside 
of network N. Far example, a service provider may require 
proof that the purchaser has sufficient funds on deposit at 
his/her bank, which might possibly not be on a network, 
before agreeing to transact business with that user. The user, 
therefore, must provide the service provider with proof of 
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pseudonym for the user. More generally, a coalition of 
service providers, all of whom match users with the same 
genre of target objects, might agree to transact with the user 
using a common pseudonym, so that the target profile 
interest summary associated with that pseudonym would be 
complete with respect to said genre of target objects. When 
a user employs several pseudonyms in order to transact with 
different coalitions of service providers, the user may freely 
choose a proxy server to service each pseudonym; these 
proxy servers may be the same or different 

From the service provider's perspective, our system pro- 
vides security, in that it can guarantee that users of a service 
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are legitimately entitled to the services used and that no user acceptable by user IPs peers. The method of VS. Pal No. 

is using multiple pseudonyms to caramunicatE with the same 5.245,656 also omits a method for the convenient updating 

provider. This uniqueness of pseudonyms is important fox of pseudonymous user profile information, such as is pro- 

the purposes of this application* since (he transaction infer- vided in this application, and does not provide for assurance 

mation gathered for a given individual must represent a 5 of unique and credentialed registration of pseudonyms from 

complete and consistent picture of a single user's activities a credentialing agent as is also provided in this application, 

with respect to a given service provider or coalition of and does not provide a means of access control to the user 

service providers; otherwise, a user's target profile interest based on profile iiiformation and conditional access as wiU 

summary and user profile would not be able to represent the be subsequent described. The method described by Loeb et 
user's interests to other parties as completely and accurately to aL also does not describe any provision fox credentials, such 

as possible, as might be used for authenticating a user's right to access 

The service provider must have a means of protection particular target objects, such as target objects that are 

from users who violate previously ajpeed upon terms of intended to be available only upon payment of a subscription 

service. For example, if a user that uses a given pseudonym fee, or target objects that are intend ed to be unavailable to 
engages in activities that violate the terms of service, then 15 younger users, 
the service provider should be able to take action against the /I Proxy Server Description 

user, such as denying the user service and blacklisting the I In order that a user may ensure mat some or all of the 
user from transactions with other parties that the user might information in the user's user profile and target profile 
be tempted to defraud. This type of situation might occur interest summary remain dissociated from the user's true 
when a user employs a service provider for illegal activities 20 identity, the user employs as an intermediary any one of a 
or defaults in payments to the service provider The method number of proxy servers available on the data communica- 
of the paper titled "Security without identification: Trans- tion network N of FIG. 2 (for example, server S2). The 
action systems to make Big-Brother obsolete", published in proxy servers function to disguise the true identity of the 
the Communications of the ACM, 28(10), October 1985; pp. user from other parties on the data communication network 
1030-1044. mccrporated herein, provides for a mechanism 25 N. The proxy server represents a given user to either single 
to enforce protection against this type of behavior through network vendors and information servers or coalitions 
the use of resolution credentials, which are credentials that thereof. A proxy server, e.g. S2, is a server computer wilh .-f 
are periodically provided to individuals contingent upon CPU, main memory, seconaarv tiisg.storag e and network 
their behaving consistent with the agreed upon terms of co mmunication function and with a da tabase func tion which 
service between the user and information provider and 30 retriev es the target-profile inte rs summar y~ana access 
network vendor entities (such as regular payment for ser- co ntrol instr uctions,asspcj ated with a particular pseudonym 
vices rendered, civil conduct, etc). Far the user's safety, if R^Mch_je present8 a particular^us er U, and performs 
the issuer of a resolution credential refuses to grant mis bi-directional ixxiting.of coinm objects and bil l- 
resolution credential to the user, then the refusal may be ing information between the user at a given client (eg. C3) 
appealed to an adjudicating third party. The integrity of the 35 and other network entities such as network vendors Vl-Vk 
user profiles and target profile interest summaries stored on and information servers Il-Im. Each proxy server maintains ** 
proxy servers is important: if a seller relies on such user- an encrypted target profile interest summary associated with 
specific information to deliver proinotioaal offers or other each allocated pseudonym in its pseudonym database D.The 
material to a particular class of users, but not to other users, actual user-specific information and the associated pseud- 
then the user-specific information must be accurate and 40 onyms need not be stored locally on the proxy server, but 
untampered with in any way. The user may likewise wish to may alternatively be stored In a distributed fashion and be 
ensure that other parties not tamper with the user's user remotely addressable from the proxy server via point-to- 
profile and target profile interest summary, since such modi- point connections . 

fication could degrade the system's ability to match the user The proxy server supports two types of bi-directional 
with the most appropriate target objects. This is done by 45 connections: point-to-point connections and pseudonymous 
providing for the user to apply digital signatures to the connections through mix paths, as taught by D. Chaum in the 
control messages sent by the user to the proxy server. Each paper titled "Untraceable Electronic Mail. Return 
pseudonym is paired with a public cryptographic key and a Addresses, and Digital Pseudonyms**, Communications of 
private cryptographic key, where the private key is known the ACM Volume 24, Number 2. February 1981. The normal 
only to the user who holds that pseudonym; when the user 50 connections between the proxy server and information 
sends a control m^gp. to a proxy server under a given servers, for example a connection bet we en proxy server S2 
pseudonym, the proxy server uses the pseudonym's public and information server S4 in FIG. 2, are accomplished 
key to verify that the message has been digitally signed by through the point-to-point connection protocols provided by 
someone who knows the pseudonym's private key. This network N as described in the "Electronic Media System 
prevents other parties from masquerading as the user 55 Architecture" section of this application. The normal type of 
Our approach, as disclosed in this application, provides an point-to-point connections may be used between S2-S4. for 
improvement over the prior art in privacy -protected pseud- example, since the dissociation of the user and the pseud- 
onymy for network subscribers such as taught in U.S. Pat onym need only occur between the client C3 and the proxy 
No. 5245.656, which provides for a name translator station server S2, where the pseudonym used by the user is avail- 
to act as an intermediary between a service provider and the 60 able. Knowing that an information provider such as S4 
user. However, while ISS. Pat No. 5245.656 provides that communicates with a given pseudonym P on proxy server S2 
the information transmitted 1 between the end user U and the does not compromise the true identity of user U. The 
service provider be doubly encrypted, the fact that a rda- bidirectional connection between the user and the proxy 
tionship exists between user U and the service provider is server S2 can also be a normal rxrint-to-poinl connection, but 
known to the name translator; and this fact could be used to 65 it may instead be made anonymous and secure, if the user 
compromise user U, for example if the service provider desires, though the consistent use of an anonymizing mix 
specializes in the provision of content that is not deemed protocol as taught by D. Chaum in the paper titled "Untracc- 
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able Electronic Mail. Return Addresses* and Digital by the user; and t hat credentials may not be transferred from 

Pseudonyms'*. Canmwnkations of the ACM. Volume 24, one user's pseudonym to a different user's pseudonym. 

Number 2, February 1981. This mix procedure provides Finally, the method provides for expiration of credentials 

untraceable secure anonymous m*fl between to parties with *nd for the i ssu ance , of IsLack marks'* against Individuals 

blind return addresses through a set of fewanhng and return 5 w ^o do not act according to the terms of service that they are 

routing servers termed "mixes". The mix routing protocol extended. This is done through the resolution credential 

as taught in the Chaum paper, is used with the proxy server mech a nism as described in Chaum* s work, in which reso- 

S2tor*xmdearegistiyofpe^ " issued P««dically by organizations to pseud- 

can be empioyed by users omer than user UJ^imation OT ^^J^ m J^ ******** user is not issued flus 

providersn-Irn. by vendors Vl-Vk and by other proxy 10 resolution credential by a parhailar^m^tion « cc^rhon 

..... - ^ > of organization, then this user cannot have it available to be 

servers to coinmurucate with the users in the proxy server s tiasisfcacd to other pseudonyms which he uses with other 

user base on a continuing basis. The security provided by ^ mizZit i OT ^ Therefore, the user cannot convince these 

this mix path protocol is distributed and resistant to traffic omtx organizations that he has acted accordance with terms 

analysis attacks and other known forms of analysis which Q f service in other dealings. If mis is the case, then the 

may be used by malicious parties totry and ascertain the true is organization can use this lack of resolution credential to 

identity of a pseudonym bearer. Breaking the protocol infer that the user is not in good standing in his other 

requires a large number of parties to maliciously collude or dealings. In one approach organizations (or other users) may 

be cryptographically compromised. In addition an extension issue a list of quality related credentials based upon the 

to the method is taught where the user can include a return experience of transaction (or interaction) with the user 

path definition in the message so the information server S4 20 which may act similarly to a letter of recnmmenrlatton as in 

can return the requested information to the user's dient a resume, If such a credential is issued from multiple 

processor C3. We «tfli»t mis feature in a novel fashion to organizations, their values become averaged. In an aherna- 

provide for access and reachability control under user and tive variation organizations may be issued credentials from 

proxy server control users such as customers which may be used to indicate to 

Validation and Allocation of a Unique Pseudonym 25 otbcr r * lturc ttscrs <juatity of service which can be expected 

Chaum* s pseudonym and credential issuance system, as subsequent users on the basis of various criteria, 

described in a^bHcation by D. Chaum and J. H_ Bvertse, * °^ ^^m^on, a pseudonym is a data record 

titled "A sccurTand jrivacy^ccting protocol for trans- of two fields. The first field specifies the address 

mitting personal i^^^^^^o^ has £^^5^ 

^^J^T** 30 r^^^nSS^ is^ianTwith a pari cliar 
system. The system allows for inmytduals to use different U5cr enters take the form of public-key mgital signa- 
pseudonyms with different organizations (such as banks and tures computed on this number, and the number itself is 
coalitions of service providers). The organizations which are issued by a pseudonym administering server Z, as depicted 
presented with a pseudonym have no more information m fjq. 2. and detailed In a generic form in the paper by D. 
about the individual than the pseudonym itself and a record 35 Chaum and J. H. Bvertse. titled "A secure and privacy- 
of previous transactions carried out under that pseudonym. protecting protocol for tr ansmitting personal information 
Additionally, credentials, which represent facts about a between orgamzations,* It is possible to send mfecmation to 
pseudonym mat an organization is willing to certify, can be the user holding a given pseudonym, by enveloping the 
granted to a particular pseudonym, and transferred to other information in a control message that specifies the pseud- 
pseudonyms that the same user employs. For, example, the 40 onym and is addressed to the proxy server that is named in 
user can use different pseudonyms with different organiza- the first field of the pseudonym; the proxy server may 
tions (or disjoint sets of organizations), yet stOl present forward the information to the user upon receipt of the 
credentials that were granted by one organization, under one control message. 

pseudonym, in order to transact with another organization While the user may use a single pseudonym for all 

under another pseudonym, without revealing that the two 45 transactions, in me more general case a user has a set of 

pseudonyms correspond to the same user. Credentials may several pseudonyms, each of which represents the user in his 

be granted to provide assurances regarding the pseudonym or her interactions with a single provider or coalition of 

bearer's age. financial status, legal status, and the like. For services designated for transactions the pseudonym set is 

example, credentials signifying "legal adult" may be issued designated for transactions with a different coalition of 

to a pseudonym based on information known about the so related service providers, and the pseudonyms used with o 

corresponding user by the given Is suing organization. Then, one provider or coalition of providers cannot be linked to the 

when the credential is transferred to another pseudonym that pseudonyms used with omer disjoint coalitions of providers, 

represents the user to another disjoint organization^ presen- All of the user's transactions with a given coalition can be 

tation of this credential on the omer pseudonym can be taken linked by virtue of the fact that they are conducted under the 

as proof of legal adulthood, which might satisfy a condition 55 same pseudonym, and therefore can be combined to define 

of terms of service. Credential-issuing organizations may a unified picture, in the form of a user profile and a target 

also certify particular facts about a user's demographic profile interest summary, of the user's interests vis-a-vis the 

profile or target profile interest summary, for example by service or services provided by said coalition. There are 

granting a credential that asserts **the bearer of this pseud- other ciraimstances for which the use of a pseudonym may 

onym is either well-read or is middle-aged and works for a 60 be useful and the present description is in no way intended 

large company**; by presenting this credential to another to limit the scope of the rim'mnd invention for example, the 

entity, the user can prove eligibility for (say) a discount previously described rapid profiling tree could be used to 

without revealing the user's personal data to that entity. pscudonymously acquire information about the user which 

Additionally, the method taught by Chaum provides for is considered by the user to be sensitive such as that 

assurances that no m dividual may correspond with a given 65 infesmation which is of interest to such entities as insurance 

organization or coalition of organizations using more than companies, medical specialists, family counselors or dating 

one pseudonym; that credentials may not be feasibly forged services. 
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Detailed Protocol of the proxy saver S2, in interacting with other network 

In our system, the organizations that the user U interacts entities such as service providers, as exemplified by server 

with are the servers Sl-Sn on the network N. However, S4 in FIG. 2. an information service provider node con- 
rather than directly corresponding with each server, the user nected to the network. The user controls the proxy server S2 

employs a proxy server, eg. S2. as an intermediary between 5 by forming digitally encoded requests that the user subse- 

the local server of the user's own client and me information quentry transmits to the proxy server S2 over the network N. 

provider or network vendor. Mix paths as described by D. The nature and format of these requests will vary, since the 

Chaiim in the paper titled ^Untraceable Electronic Mail, proxy server may be used far any of the services described 

Return Addresses, and Digital Pseudonyms'*, Communlca- u this application, such as the browsing, querying, and other 

tions of the ACM, Volume 24, Number 2, February 1981 |Q navigational functions described below, 

allow far untraceabiliry and security between the client, such a generic scenario, the user wishes to communicate 

as O, and the proxy server, e.g. SZ Let S(MJQ represent under pseudonym P with a particular information provider or 

^^/ 8 ^l1-? CSSage M b L m ?? ular ( ?P° Iie 5? lti0D user at address A, where Pis a pseudonym allocated to the 

with key K as detailed in a paper by Rivest, R. Shamir, , . . A . „ „,utt~ *aa~~ * 

A., and Adleman, L. Tided '^A^ethod for^taitmig dS user and A is cfcher a public networkaddressjua server such 

Statures and public-key cryptosysterr^ JuhSlnfce " 85 OT ^f°^/ S ^!^ yi ^f * rcglsta1K ^ ? n 

CoWACM 21,2 Februaryl2a526; Oncea user applies OTVCr . su *"^.^ 5 0 TfL v ^ l0n rf 

to server Z for a pseudonym P and is granted a signed scenario, address A is the adoress of an rrxformatxon provider, 

pseudoirymsiga^wirh^ and the hot b requesting that the in formatioa provider send 

following protocol takes place to establislf an entry for the target objects of interest) The user must form a request R to 

user U in the proxy server S2*s database D. 1. The user now 20 proxy server S2. that requests proxy server S2 to send a 

sends proxy server 52 the pseudonym, which has been message to address A and to forward the response back to the 

signed by Z to indicate the authenticity and uniqueness of user. The user may thereby communicate with other parties, 

the pseudonym. The user also generates a FK^ SK, key pair either non-pseudonymous parties, in the case where address 

tbruse with the granted pseudonym, where is the private key A is a public network address, or pseudonymous parties, in 

associated with the pseudonym and PI^ is the public key 23 ^ «se where address A is a pseudonym held by. for 

associated with me pseudonym. The user forms a request to example, a business or another user who prefers to operate 

establish pseudonym P on proxy server S2, by sending the pseudocymously. 

signed pseudonym S(R SKJ to the proxy server S2 along h other scenarios, the request R to proxy server S2 

with a request to create a new database entry, indexed by P, formed by the user may have different content For example, 

and the public key PK^ It envelopes the message and 30 request R may instruct proxy server S2 to use the methods 

transmits it to a proxy server S2 through an aixxrymizing described later in this description to retrieve from the most 

mix path, along with an anonymous return envelope header. convenient server a particular piece of information that has 

2. The proxy server S2 receives the database creation entry been multicast to many servers, and to send rliis information 

request and associated certified pseudonym message. The to the user. Conversely, request R may Instruct proxy server 

proxy server S2 checks to ensure that the requested pseud- 35 S2 to mult ic ast to many servers a file associated with a new 

onymPis signed by server Zand if so grants the request and target object provided by the user, as described below. If the 

creates a database entry for the pseudonym, as well as u» « a subscriber to the news cupping service described 

storing me user's public key PK-, to ensure that only the user below, request R may instruct proxy server S2 to forward to 

U can make requests in the future usiiig pseudonym P. B.The the user aU target objects that the news dipping service has 

structure of the user's database entry consists of a user 40 sent to proxy server S2 for the user's attention. If the user is 

profile as detafledherein, a ta^ employing the active navigation service described below, 

detailed herein, and a Boolean cc^nbination of access control request R may instruct proxy server S2 to select a particular 

criteria as detailed below, along with the associated public cmster from the hierarcriica^ 

key for the pseudcKr/mP.4. Atany tmv afted of its subctusters to the user, or to activate a query that 

for Pseudonym P is established, the user U may provide 43 ternrjorarily affects proxy server STs record of the user's 

proxy server S2 with credentials on that pseudonym, pro- target profile interest summary. If the user is a member of a 

vided by third parties, which credentials make certain asser- virtual conununiiy as described below, request R may 

tions about that rjseudonynL The proxy server may verify instruct proxy server S2 to forward to the user all messages 

(hose credentials and make appro p ri ate modifications to the that have been sent to the virtual community, 

user's profile as required by these credentials such as so Regardless of the content of request R, the user, at client 

recording the user's new demographic status as an adult ft C3, initiates a connection to the user's local server SI. and 

may also store those credentials, so that it can present them instructs server SI to send the request R along a secure mix 

to service providers on the user's behalf. path to the proxy server S2 . in it iati n g the following sequence 

The above steps may be repeated, with either the same or °^ actions: 

a different proxy server, each tim user U requires a new 55 1. The user's client processor C3 forms a signed message 

pseudonym for use with a new and disjoint coalition of S(R. SKj»), which is paired with the user's pseudonym 

providers. In practice there is an extremely small probability P and (if die request R requires a response) a secure 

th«t a given pseudonym may have already been all oc**?^ by one-time set of return envelopes, to form a message M. 

due to the random nature of the pseudonym generation It protects the message M with an multiply enveloped 

process carried out by Z.ff this highly unlikery evert cccurs, 60 route for the outgoing path. The enveloped route s 

then the proxy server S2 may reply to the user with a signed provide for secure communication between SI and the 

message indicating that the generated pseudonym has proxy server S2. The message M is enveloped in the 

already been allocated, *"rf a sting for a new pseudonym to most deeply nested message and is therefore difficult to 

Degenerated. recover should the message be intercepted by an eaves- 

Pseudonymous Control of an Information Server 65 dropper. 

Once a proxy server S2 has authenticated and registered 2. The message M is sent by client C3 to its local server 

a user's pseudonym, the user may begin to use the services SI. and is then routed by the data communication 
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network N from server SI through a set of mixes as 
dictated by the outgoing envelope set and actives at the 
selected proxy server S2. 

3. The proxy server S2 separates the received message M 
into the request message R, the pseudonym P. and (if 
included) the set of envelopes for the return path. The 
proxy server S2 uses pseudonym P to index and retriev e 

th» rgjpcpftnrfing PTfiM in f*"*Y database. 

whic h record is stored in local storagc^at-the-ppoxv- 
server S2 or on ctherdistoibu^ 
sfl>lelb~proxy"server S2 via the netw cjkNJThis record 
contains a public key FK^ user-specific inf ormatioii, 
and"cre<lentials associated with pseudonym P. The 
proxy server S2 uses the public key PKp to check that 
the signed version S(R, SKp) of request message R is 
valid 

4. Provided that the signature on request message R is 
valid, the proxy server $2 acts on the request R. For 
example, in the generic scenario described above, 
request message R includes an rmhtriritrl message Ml 
and an address A to whom message M 1 should be sent; 
in this case, proxy server S2 sends message Ml to the 
server named in address A. such as server S4. The 
cornrriumcation is done using signed and optionally 
encrypted messages over the normal point to point 
connections provided by the data coinmnriicauon net- 
work N. When necessary in order to act on rmhrdrtrd 
message Ml. server S4 may exchange or be caused to 
exchange further signed and optionally encrypted mes- 
sages with proxy server S2, still over normal point to 
point connections, in order to negotiate the release of 
user-specific Mcimahbn and credentials from proxy 
server S2. In particular, server S4 may require server S2 
to supply credentials proving that the user is entitled to 
the information requested— for example, proving that 
the user is a subscriber in good standing to a particular 
information service, that me user is old enough to 
legally receive adult material, and that the user has been 
offered a particular discount (by means of a special 
discount credential issued to the user's pseudonym). 

5. If proxy server S2 has sent a message to a server S4 and 
server S4 has created a response M2 to message Ml to 
be sent to the user, then server S4 transmits the 
response M2 to the proxy server S2 using normal 
network point-to-point connections. 

6. The proxy server S2, upon receipt of the response M2» 
creates a return message Mr comprising the response 
M2 embedded in the return envelope set that was 
earlier transmitted to proxy server S2 by the user in the 
original message M. It transmits the return message Mr 
along the pseudonymous mix path specified by this 
return envelope set, so that the response M2 reaches the 
user at the user's client processor C3. 

7. The response M2 may contain a request for electronic 
payment to the information server S4. The user may 
then respond by means of a message M3 transmitted by 
the same means as described for message Ml above, 
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which message M3 encloses some form of anonymous 

payment. Alternatively, the proxy server may respond £0 decline to provide service to the pseudonym under which it 



targeted to the user. Typically, if the user has just 
retrieved a target object X. then (a) either proxy server 
S2 or information server S4 determines a weighted set 
of advertisements that are "associated with" target 
object X, (b) a subset of this set is chosen randomly, 
where the weight of an advertisement is proportional to 
the probability that it is included in the subset, and (c) 
proxy server S2 selects from this subset just those 
advertisements that the user is most likely to be inter- 
ested in. In the variation where proxy server S2 deter- 
mines the set of advertisements associated with target 
object X, then this set typically consists of all adver- 
tisements that the proxy servers owner has been paid to 
disseminate and whose target profiles are within a 
threshold similarity distance of the target profile of 
target object X. In the variation where proxy server S4 
determines the set of advertisements associated with 
target object X. advertisers typically purchase the right 
to include advertisements in this set In either case, the 
weight of an advertisement is determined by the 
amount that an advertiser is willing to pay. Following 
step (c), proxy server S2 retrieves the selected adver- 
tising material and transmits it to the user's client 
processor C3. where it will be displayed to the user, 
within a specified length of time after it is received, by 
a trusted process running on the user's client processor 
C3. When proxy server S2 transmits an advertisement 
it sends a message to the advertiser, indicating that the 
advertisement has been transmitted to a user with a 
particular predicted level of interest The message may 
also indicate the identity of target object X In return, 
the advertiser may transmit an electronic payment to 
proxy server S2; proxy server S2 retains a service fee 
for itself, optionally forwards a service fee to informa- 
tion server S4. and the balance is forwarded to the user 
or used to credit the user's account on the proxy server. 
9. If the response M2 contains or identifies a target object 
the passive and/or active relevance feedback that the 
user provides on this object is tabulated by a process on 
the user's client processor C3. A summary of such 
relevance feedback information, digitally signed by 
chert processor C3 with a proprietary private key 
SK^y, is periodically transmitted through an a secure 
mix path to the proxy server S2, whereupon the search 
profile generation module 2*2 resident on server S2 
updates the appropriate target profile interest summary 
associated with pseudonym P, provided that the signa- 
ture on the summary message can be authenticated with 
the corresponding public key PKj-, which is available 
to all tabulating process that are ensured to have 
integrity. 

When a consumer enters into a financial relationship with 
a particular information server based on both parties agree- 
ing to terms for the relationship, a particular pseudonym 
may be extended for the consumer with respect to the given 
provider as detailed in the previous section. When entering 
into such a relationship, the consumer and the service 
provider agree to certain terms. However, if the user violates 
the terms of this relationship, the service provider may 



automatically with such a payment, which is debited 

fanm an *rrvmnt maintained hy thnpmxy setvgr for this 

user. 

8. Either the response message M2 from the information |74onym bearer returns to good standing, 
server S4 to the user, or a subsequent message sent by 63 Prefetching of Target Objects 
the proxy server S2 to the user, may contain advertising 
material mat is related to the user's request and/or is 



transacts with the user. In addition, the service provider has 
me recourse of refusing to provide resolution credentials to 
the pseudonym, and may choose to do so until the pseud- 



In some circumstances, a user may request access in 
sequence to many riles, which are stored on one or more 
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information servers. This behavior is common when navi- Pre -fetching exhibits a cost-benefit tradeoff. Let t denote 
gating a hypertext system such as the World Wide Web. or the approximate number of minutes that prefetched files are 
when using the target object browsing system described retained in local storage (before they are deleted to make 
below. room for other pre-fetched files). If the system elects to 
In general, the user requests access to a raTrirnlar target s pre-fetch a file corresponding to a target object X. then the 
object or menu of target objects; once me cone^nding file user benefits from a fast response at no extra cost, provided 
has been transmitted to the user's client processor, the user that the user explicitly requests target object X soon there- 
views its contents and makes another such request, and so after. However, if the user does not request target object X 
on. Each request may take many seconds to satisfy, due to J)vmn t n^ utes of tlie rce-fetch, then the prefetch was 
retrieval and transmission ^y^owcto to the extent 1Q WOJthless , and its cost is an added c*>st that ^ 

that the sequence of requests ts predictable, the system for _ Sn A~+M i« 11C ^ -n,. o™^« **,~^ 

customized dectrom^^ of desirable objects am 

respond more quickly to each request, by retrieving or fore provides benefit * no cast ^mletiK second sanano 

starting to retrieve the appooriate files even before the user incurs a cost at no benefit The system tries to favor the first 
requests them This earryretrieval is termed ^fetching of * fetching only those files that the user will 

g]^" 15 access anyway. Depending on the user s wishes, the system 

Pre-fetching of locally stared data has been heavily stuoV may pre-fetch either conservatively, where it controls costs 

ied in memory hierarchies, including CPU caches and sec- by prefetching only files that the user is extremely likely to 

ondary storage (disks), for several decades, A leader in this request explicitly (and that are relatively cheap to retrieve), 

area has been A. J. Smith of Berkeley, who identified a or more aggressively, where it also pre-f etches files mat the 

variety of schemes and analyzed opportunities using exten- 20 user is only moderately likely to request explicitly, thereby 

sive traces in bom databases and CPU caches. His conciu- mcreasing both the total cost and (to a lesser degree) the total 

sion was that general srfirmrs only really paid off where benefit to the user. 

there was some reasonable chance that sequential access was In the system described herein, pre-fetching for a user U 

occurring, eg. in a sequential read of data. As the balances is accomplished by the user's proxy server S. Whenever 

between various latencies in the memory hierarchy shifted 25 proxy server S retrieves a user-requested file F from an 

during the late 1980* s and early 1990's, J. M. Smith and information server, it uses the identity of this file F and the 

others identified further opportunities for prefetching of characteristics of the user, as described below, to identify a 

both locally stored data and network: data, la particular, group of other files G1...G&: mat the user is likely to access 

deeper analysis of patterns in work by BUha showed the soon. The user's request for file Fis said to "trigger" files Gl 

possibility of using expert systems for deep pattern analysis 30 . . . Gfc. Proxy server S pre-fetches each of these triggered 

that could be used for pre-fetching. Work by J. M. Smith files Gl as follows: 

proposed the use of reference history trees to an ti c ipate 1. Unless file Gi is already stored locally (e.g., due to 
references in storage hierarchies where there was some previous pre-fetch), proxy server S retrieves file Gi 

historical data. Recent work by Touch and the Berkeley from an appropriate information server and stores it 

work addressed the case of data on the World-Wide Web, 35 locally. 

where the large size of images and u^k>iui latencies provide 2 . Proxy server S timestamps its local copy of file Gi as 
extra incentive to pre-fetch; Touch's technique is topre-send having just been pre-fetched, so that file Gi will be 

when large bandwidths permit .« retained in local storage for a minimum of approxi- 
HIML storage references embedded in WEB pages, and the mate ^ t before being deleted. 

Berkeley work uses techiumes similar to J. M. Smith's 40 uscr v (OTj m other iiser registered 

reference histories specialized to the semantics of ffTML wifcpeoxy server S) requests proxy server S to retrieveafile 

™f* . _ _ , _ , that has been pre-fetched and not yet deleted, proxy server 

Successful prefetching depends on the ability of the $ can men retrieve the file from local storage rather than 

system to predict the iiext action from another server. Id a variation on steps 1-2 above, proxy 
context of the system for customized electronic identifica- 45 g ^.f^^ a file Gi somewhat differently, so that 

tion of desirable objects, it is possible to cluster users into pre-fetched files are stored on the user's client processor q 

groups acccrdmg to the similarity c^thek ramer than on server S: 

of the well-known pre-fetching methods that collect and t „ e , . ^ . , . . 

utilize aggre^ statistics on pit user behavior, in order to 1. If proxy server S has not pre-fetched file Gi in the past 
TIT r^iT^ • tminutes,it retrieves file Gi and transmits a to user U's 

predict future user behavior, may then be implemented in so so „ 

as to collect and utilize a separate set of gfoH«*H** for each processor q. 

duster of users. In this way. the system generalizes its access 2 - Uoon °* mc mc ^ sa 8f. sent in step 1. client q 

pattern statistics from each user to similar users, without stores a local copy of file Gi if one is not currently 

generalizing among users who have substantially different stored. 

interests. The system may further collect and utilize a similar 55 3- Proxy server S notifies client q that client q should 
set of statistics that describes the aggregate behavior of all timestamp its local copy of file Gi; this notification may 

users; in cases where the system cannot confidently make a be combined with the message transmitted in step 1, if 

prediction as to what a particular user will do. because the ir- 
relevant statistics concerning that user's user cluster are 4. Upon receipt of the message sent in step 3, client q 
derived from only a small amount of data, the system may 60 timestamps its local copy of file Gi as having just been 
instead make its predictions based on the aggregate statistics pre-fetched. so that file Gi will be retained in local 

for all users, which are derived from a larger amount of data. storage for a minimum of approximately t minutes 

For the sake of concreteness, we now describe a particular before being deleted. 

instantiation of a prefetching system, that both employs During tt>e period that client q retains file Gi in local storage, 

these insights and that makes its pre-fetching decisions 65 client q can respond to any request for file Gi (by user U or. 

through accurate measurement of the expected cost and in principle, any other user of cHent q) immediately and 

benefit of each potential pre-fetch. without the assistance of proxy server S. 
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The difficult task is far proxy saver S, each time it 
retrieves a file F in response to a request to identify the files 
Gl . . . Gk that should be triggered by the request for file F 
and pre-fetched immediately, ftoxy server S employs a 
cost-benefit analysis, perfonning each pre-fetch whose ben- 
efit exceeds a user-determined multiple of its cost; the user 
may set the multiplier low for aggressive prefetching or high 
fox conservative prefetching. These prc-fctches may be 
performed in parallel The benefit of pre-fetching file Gi 
immediately is defined to be the expected number of seconds 
saved by such a pre-fetch, as compared to a situation where 
Gi is left to be retrieved later (cither by a later pre-fetch. or 
by the user's request) if at alL The cost of pro-fetching file 
Gi immediately is defined to be the expected cost for proxy 
server S to retrieve file Gi, as determined for example by the 
network locations of server S and file Gi and by irforrnanon 
provider charges, times 1 minus the probability that proxy 
server S win have to retrieve file Gi within t minutes (to 
satisfy either a later pre-fetch or the user's explicit request) 
if it is not pre-fetched now. 

The above definitions of cost and benefit have some 
attractive properties. For example, if users tend to retrieve 
either file Fl or file F2 (say) after file F, and tend only in the 
former case to subsequently retrieve file Gi, then the system 
will generally not pre-fetch Gi immediately after retrieving 
file F: for, to the extent that the user is likely to retrieve file 
F2, the cost of the pre-f etch is hi#u and to the extent mat the 
user is likely to retrieve file Fl instead, the benefit of die 
pre-fetch is low, since the system can save as much or nearly 
as much time by waiting until the user chooses Fl and 
pre-fetching Gl only then. 

The proxy server S may estimate the necessary costs and 
benefits by adhering to the following discipline: 

1. Proxy server S maintains a set of disjoint dusters of the 
users in its user base, clustered according to their user 
profiles, 

2. Proxy server S maintains an initially empty set PPT of 
•"pre-fetch triples** <CF,G>, where F and G are files, 
and where C identifies either a cluster of users or the set 
of all users in the user base of proxy server S. Each 
pre-fetch triple in the set PFT is associated with several 40 
stored values specific to that triple. Pre-fetch triples and 
their associated values are maintained according to the 
rules in 3 and 4. 

3. Whenever a user U in the user base of proxy server S 
makes a request R2 for a file G, or a request R2 that 45 
triggers file G, then proxy server S takes the following 
actions: 

a. For C being the user duster containing user U, and 
then again for C being the set of all users: 

b. For any request R# for a file, say file F. made by user so 
U during the t minutes strictly prior to the request 
R2: 

c. If the triple <CF,G> is not currently a member of the 
set PFT. it is added to the set PFT with a count of 0. 
a trigger-count of 0, a target-count of 0, a total 
benefit of 0, and a timestamp whose value is the 
current date and time. 

d. The count of the triple <C,F,G> is increased by one. 

e. If file G was not triggered or explicitly retrieved by 
any request that user U made strictly In between 
requests RO and R2, then the target-count of the 
triple <CP t G> is increased by one. 

f. If request R2 was a request for file G, then the total 
benefit of triple <CF,G> is increased either by the 
time elapsed between request R0 and request R2, or 
by the expected time to retrieve file G, whichever is 
less. 



g. If request R2 was a request for file G, and G was 
triggered or explicitly retrieved by one or more 
requests that user U made strictly in between 
requests R# and R2, with Rl denoting the earliest 
such request, then the total benefit of triple <CF,G> 
Is decreased either by the time elapsed between 
request Rl and request R2, or by the expected time 
to retrieve file G. whichever is less. 
, if a user U requests a file E then the trigger-count is 
incremented by one for each triple currently in the set 
PFT such that the triple has farm <CP T G>, where user 
U is in the set or duster irirnrificd by C. 
, The "age** of a triple <CF,G> is defined to be the 
number of days elapsed between its txmestamp and the 
current date and time. If the age of any triple <CJPXS> 
exceeds a fixed constant number of days, and also 
exceeds a fixed constant multiple of the triple *s count, 
then the triple may be deleted from (he set PFT. 
Proxy server S can t h ere fore decide rapidly which files G 
20 should be triggered by a request for a given file F from a 
given user U, as follows, 

1. Let Ct be the user cluster containing user U. and CI be 
the set of all users. 

2. Server S constructs a list Lof all triples <C0P.G> such 
that <CO*F.G> appears in set PFT with a count exceed- 
ing a fixed threshold. 

3. Server S adds to list L all triples <CLF,G> such that 
<COP,G> does not appear on Est L and <C1J 7 ,G> 
appears in set FFT with a count exceeding another fixed 
threshold. 

4. Far each triple <CF,G> on list L: 

5. Server S computes the cost of triggering file G to be 
expected cost of retrieving file Gi, times 1 minus the 
quotient of the target-count of <CJ\G> by the trigger- 
count of <CFXj>. 

6. Server S computes the benefit of triggering file G to be 
the total benefit of <CF,G> divided by the count of 
<CF,G>. 

7. Finally, proxy server S uses the computed cost and 
benefit, as described earlier, to decide whether file G 
should be triggered. The approach to prefetching just 
described has the advantage that all data storage and 
manipulation concerning prefetching decisions by 
proxy server S is handled locally at proxy server S. 
However, this "user-based** approach does lead to 
duplicated storage and effort across proxy servers, as 
wen as incomplete data at each individual proxy server. 
That is, the Information indicating what files are fre- 
quently retrieved after file F is scattered In an uncoor- 
dinated way across numerous proxy servers. An 
alternative, "file-based" approach is to store all such 
information with file F itself The difference is as 
follows. In the user-based approach, a pre-fetch triple 
<C,F,G> in server S's set FFT may mention any file F 
and any file G on me network, but is restricted to 
clusters C that are subsets of the user base of server S. 
By contrast, in the file-based approach, a pre-fetch 
triple <CKG> in server S's set PFT may mention any 
user duster C and any file G on the network, but is 
restricted to files F that are stored on server S. (Note 
that in the file-based approach, user clustering is net- 
work wide, and user dusters may indude users from 
different proxy servers.) When a proxy server S2 sends 
a request to server S to retrieve file F for a user U, 
server S2 indicates in this message the user LPs user 
cluster Ct, as well as the user ITs value for the 
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user-determined multiplier that is used in cost-benefit 
analysis. Server S can use this information, together 
with all Us triples in its set PFT of the form <Cti 7 ,G> 
and <C1,F.G>, where CI is the set of all users every- 
where on the network, to determine (exactly as in the 
user-based approach) which files Gl . . . Gfc are 
triggered by the request for file R When server S sends 
file F back to proxy server S2. it also sends this list of 
files Gl . . . Gk» so that proxy server S2 can proceed to 
pro-fetch files Gl . . . GL 
The file-based approach requires some additional data 
transmission. Recall that under the user-based approach, 
server S must execute steps 3c-3g above fox any ordered 
pair of requests R# and R2 made within t minutes of each 
other by a user who employs server S as a proxy server. 
Under the file-based approach, server S must execute steps 
3c-3g above for any ordered pair of requests RO and R2 
made within t annates of each other, by any user on the 
network, such that R9 requests a file stored on server S. 
Therefore* when a user makes a request R2. the user's proxy 
server must send a notification of request R2 to all servers 
S such that, during the preceding t minutes (where the 
variable t may now depend on server S), the user has made 
a request R# for a file stored on server S. This notification 
need not be sent immediately, and it is generally more 
efficient for each proxy server to buffer up such notifications 
and send them periodically in groups to the appropriate 
servers. 

Access And Reachability Control of Users and User-Specific 
Information 

Although users* true identities are protected by the use of 
secure mix paths, pseudonymity does not guarantee com- 
plete privacy. In particular, advertisers can in principle 
employ user-specific data to barrage users with unwanted 
solicitations. The general solution to this problem is for 
proxy server S2 to act as a representative on behalf of each 
user in its user base, permitting access to the user and the 
user's private data only in accordance with criteria that have 
been set by the user. Proxy server S2 can restrict access in 
two ways: 

1 . The proxy server S2 may restrict access by third parties 
to server S2's pseudonymous database of user-specific 
information. When a third party such as an advertiser 
sends a message to server S3 requesting the release of 
user-specific information for a pseudonym P. server S2 
re fuses to honor the request unless the message 
includes credentials for the access or adequate to prove 
mat the accessor is endued to this information. The user 
associated with pseudonym P may at any time send 
signed control messages to proxy server S2> specifying 
the credentials or Boolean combinations of credentials 
that proxy server S2 should thenceforth consider to be 
adequate grounds for releasing a specified subset of the 
tofomution associated with pseudonym P. Proxy server 
S2 stores these access criteria with its database record 
for pseudonym P. For example, a user might wish to 
proxy server S2 to release purchasing information only 
to selected iiiformation providers, to charitable organi- 
zations (that is, organizations that can provide a 
government-issued credential mat is issued only to 
registered charities), and to market researchers who 
have paid user U for the right to study user Us 
purchasing habits. 

2. The proxy server S2 may restrict the ability of third 
parties to send electronic messages to the user. When a 
third party such as an advertiser attempts to send 
information (such as a textual message or a request to 
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enter into spoken or written real-time communication) 
to pseudonym P. by sending a message to proxy server 
S2 requesting proxy server S2 to forward the informa- 
tion to the user at pseudonym P. proxy server S2 will 
refuse to honor the request, unless the message includes 
credentials for the accessor adequate to meet the 
requirements the user has chosen to impose, as above, 
on third parties who wish to send information to the 
user. If the message does include adequate credentials, 
then proxy server S2 removes a single-use pseudony- 
mous return address envelope from it s database record 
for pseudonym P. and uses the envelope to send a 
message containing the specified informatioo along a 
secure mix path to the user of pseudonym P. If the 
envelope being used is the only envelope stored for 
pseudonym P, or more generally if the supply of such 
envelopes is low, proxy server S2 adds a notation to this 
message before sending it which notation indicates to 
die user's local server that it should send additional 
envelopes to proxy server S2 for future use. 

In a more general variation, the user may instruct (he 
proxy server S2 to impose more complex requirements on 
the granting of requests by third parties, not simply boolean 
conciliations of required credentials. The user may impose 
any Boolean combination of simple requirements that may 
include, but are not limited to. the following: 

(a.) the accessor (third parry) is a particular party 

(b.) the accessor has provided a particular credential 

(c) satisfying the request would involve disclosure to the 
accessor of a certain fact about the user's user profile 

(d.) satisfying the request would involve disclosure to the 
accessor of the user's target profile interest summary 

(e.) satisfying the request would involve disclosure to the 
accessor of statistical summary data, which data are 
computed from the user's user profile or target profile 
interest summary together with the user profiles and 
target profile interest summaries of at least n other users 
in the user base of the proxy server 

(f.) the content of the request is to send the user a target 
object, and this target object has a particular attribute 
(such as high reading level, or low vulgarity, or an 
authenticated Parental Guidance rating from the 
MPAA) 

(g.) the content of the request is to send the user a target 
object, and this target object has been digitally signed 
with a particular private key (such as the private key 
used by the National Pharmaceutical Association to 
certify approved documents) 

(b.) the content of the request is to send the user a target 
object, and the target profile has been digitally signed 
by a profile authentication agency, guaranteeing that 
the target profile is a true and accurate profile of the 
target object it claims to describe, with all attributes 
authenticated. 

(L) the content of the request is to send the user a target 
object, and the target profile of this target object is 
within a specified distance of a particular search profile 
specified by the user 

(j.) the content of the request is to send the user a target 
object, and the proxy server S2. by using the user's 
stored target profile interest summary, estimates the 
user's likely interest in the target object to be above a 
specified threshold 

(k.) the accessor indicates its willingness to make a 
particular payment to the user in exchange for (he 
fulfillment of the request 
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The steps required to create and maintain the user's can be used to protect the user from inappropriate or 

access-control requirements are as follows: misrepresented target objects that the user may request If 

1 . The user composes a boolean combination of predicates user requests a target object from an information server* 
that apply to requests; the resulting complex predicate but the target object turns out not to meet the access control 
should be true when applied to a request that the user 5 criteria, then the proxy server will not permit the information 
wants proxy server S2 to honor, and false otherwise. to transmit the target object to the user, or to charge 
The complex predicate may be encoded in another the user for such t ransmissio n. For example, to guard against 
farm, for efficiency target objects whose profiles have been t amp ered with, the 

2. The complex predicate is signed with SK,. and trans- usa ™9 **** "i!^^? 81 ^ ******* 
miitaUrom the user's dier^processor C3?o the proxy 10 to F™* ^^^'^^ ^ ^ f 
server S2 through the mix path endosed in a packet that a ***** a P" 3 ^ authentication agency. As 
also contains the user's pseudonym P. m< **** cxam P lc ' ofa ^dusCTmay instruct fee 

i tk* ^ v J~h,~ tL n»r^ v^fi« it* P™*y server that only target c^«ls that have been digitaUy 

3. The proxy server S2 receives the packet, verifies its signed by a recc^nizedcrakl prelection organization may be 
authenticity usmg PKp and stores the access control 15 transmitted to me ^ mus , ^ ^ wm not let the 
instructions specified^ the packet as part of its data- USCI retrieve pornography, even from a rogue iirfcrmarion 
base record for pseodonym P. server that is willing to provide pornography to users who 

The proxy server S2 enforces access control as follows: have not supplied an aduhhood credential 

1. The third party (accessor) transmits a request to proxy Distribution of Information with Multicast Trees 

server S2 using the normal pcte-to-point coimections 20 The graphical representation of me network N presented 

provided by the network N. The request may be to in fig. 3 shows that at least one of the data communications 

access the target profile interest summaries associated links can be H imini rffH , as shown in FIG. 4, while still 

with a set of pseudonyms PI . . . Pa, or to access the enabling the network N to transmit messages among all the 

user profiles associated with a set of pseudonyms PI . servers A—D. By elimination, we mean mat the Knir is 

. . Pn. or to forward a message to the users associated 25 unused in the logical <*«tgp of the network, rather than a 

with pseudonyms PI . . . Pn. The accessor may explic- physical disconnection of the ifa* The graphs that result 

itly specify the pseudonyms PI ... Pn, or may ask that when all redundant data comrnjunicatioiis links are etirrri- 

Pl . . . Pn be chosen to be the set of all pseudonyms noted are termed "trees" or "connected acyclic graphs." A 

registered with proxy server S2 that meet specified graph where a message could be transmitted by a server 

condit io ns. 30 through other servers and then return to the transmitting 

2. The proxy server S2 indexes the database record for server over a different originating data communications link 
each pseudonym Pi (l<=I<=o). retrieves the access is termed a "cycle." A tree is thus an acyclic graph whose 
requirements provided by the user associated with Pi, edges (links) connect a set of graph "nodes" (servers). The 
and determines whether and how the transmitted tree can be used to efficiently broadcast any data file to 
request should be satisfied for PL If the requirements 35 selected servers in a set of interconnected servers. 

are satisfied. S2 proceeds with steps 3a-3c The tree structure is attractive in a communicarjons net- 

3a If the request can be satisfied but only upon payment work because much information distribution is multicast in 

of a fee, the proxy server S2 transmits a payment nature— that is, a piece of infermation available at a single 

request to the accessor, and waits for the accessor to source must be distributed to a mumplidry of points where 

send the payment to the proxy server S2. Proxy server 40 the information can be accessed. This technique is widely 

S2 retains a service fee and forwards the balance of the known: for example, TAX trees'* are in common use in 

payment to the user associated with pseudonym PL via poHtical organizations, and multicast trees are widely used 

an anonymous return packet that dr^ distrirxrtion of rmiltimrdia data in the Internet; for 

3b. If the request can be satisfied but only upon provision JET^^ 1 ^^ ~ Mu * cast 

<>f acromial me p^ ^^T"?™ 

rial request to the accessor, and waits for the accessor Tnicrry Ttaietf & Ian Watery Computer Corn- 

«*« ~-A~~*i-t *~ ,~™, o munication Review, VoL 24, # 4, October '94, Proceedings 

to send the credendal to the proxy «™er S2. of SIGCOMM^, pp. 5!W) or "An Architecture For 

3c The proxy server S2 satisfies the request by disclosing wide-Area Multicast Routing," (Stephen Deering, Deborah 

user-specific iiifccmadon to the accessor, by rxoviding x Dlno Farin4CcL V an Jacobson, Ching-Gung Uu, & 

the accessor with a set of single-use envelopes to Un ^ ng Wci Coa9USa Ccmnmnlcation Review, VoL 24, # 

comimimcatedh^ywimmeuser,OT^ 4 (^dba '94, Proceedings of SIGCOMM'94, pp. 

message to the user, as requested. 126-135). While there are many possible trees that can be 

4. Proxy server S2 optionally sends a message to the overlaid on a graph representation of a network, both the 
accessor, indicating why each of the denied requests for 53 nature of the networks (e.g., the cost of transmitting data 
PI . . . Pn was denied, and/or indicating how many over a link) and their use (for example, certain nodes may 
requests were sa t is fi ed, exhibit more frequent mtercorninunication) can make one 

5. The active and/or passive relevance feedback provided choice of tree better man another for use as a multicast tree, 
by any user U with respect to any target object sent by One of the most difficult problems in practical network 
any path from the accessor is tabulated by the above- go design is the construction of "good" multicast trees, that is, 

rf^rrifrftri tabulating prnrr<t rr<:iHrnt ahnvr t a gimmar y tree choices wfakfa exhibit low COSt (due to data not travers- 

processor C3. As described above, a summary of such ing links unnecessarily) and good performance (due to data 

information is periodically transrnitted to the proxy frequently being close to where it is needed) 

server S2 to enable the proxy server S2 to update that Constructing a Multicast Tree 

user's target profile interest summary and user profile. 65 Algorithms for constructing multicast trees have either 

The access control criteria can be applied to solicited as been ad-hoc as is the case of the Deering, et al Internet 

well as unsolicited transmissions. That is, the proxy server multicast tree, which adds clients as they request service by 
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grafting them into the existing tree, or by construction of a of users in the user base of proxy server Si who will 
minimum cost spanning tree A distributed algorithm for subsequently access a target object from cluster C. This 
creating a spanning tree (defined as a tree that connects, or weight is computed by proxy server Si in any of several 
"spans," all nodes of the graph) on a set of Ethernet bridges ways, all of which make use of the similarity measure- 
was developed by Radia Pertman ("Interconnections: 5 merit computation described herein. 
Bridges and Routers," Radia Redman. Addison- Wesley, One variation makes use of the following steps: (a) Proxy 
1992). Creating a niimmal-cost spanning tree for a graph server Si randomly selects a target object T from cluster C 
depends on having a cost model for the arcs of the graph (b) For each pseudonym in its local database* with associ- 
( corresponding to cornrnuni cations links In the cornrrninica- atcd user U. proxy server Si applies the techniques disclosed 
dons network), hi the case of Ethernet bridges, the default JQ above to user IFs stored user profile and target profile 
cost (more complicated costing models for path costs are interest summary in order to estimate the interest w(U, T) 
discussed on pp. 72-73 cfPerlman) is calculated as a simple that user U has in the selected target object T. The aggregate 
distance measure to the root; thus the spanning tree mini. mte i^ w(SLT) that th« user base of proxy server SI has in 
mizesthecosttomeroott^fi^ the target object T is defined to be the sum of these interest 

^^^fn'thrSt^f ^i^t^Z 15 vames w(U^Alternatrvely, w(Si,T) may be defined to be 

from the root In this algorithm, the root is elected by 1J A V, ^^nr *rv\ «~ -11 it {„ 

recourse to a numeric IDcWained in -configuration mes- ^ ™ «? ^ °T n £ « 

sages": me server w hose TO has minimiu^ Here s(«) is a sigmoidal function mat u clo« to 0 for sinall 

chosen as the root Several problems exist with this algo- "^^^f^ to 8 fe** aa ^ umei! 5j 

rithm in general First, the method of using an ID does not thus s(w(U. T)) estimates the probability that user U will 

necessarily select the best root for me nodes mterconnected 20 access target object T, which probability is assumed to be 

in the tree. Second, the cost model is simplistic. independ^m of me probability that any ether user wm access 

We first show how to use the similarity -based methods target object T. In a variation, w(SiT) is made to estimate the 

described above to select the servers most interested in a probabih ty that at least otie user from the user base of Si will 

group of target objects, herein termed "core servers** for that access target object T; then w(Sl T) may be defined as the 

group. Next we show how to construct an unrooted multicast 25 maximum of values w(U. T), or of 1 minus the product over 

tree mat can be used to broadcast files to these core servers. the users U of the quantity (l-s(w(U. T))). (c)Proxy server 

Finally we show how files corresponding to target objects Si repeats steps (aHb) for several target objects T selected 

are actually broadcast through the multicast tree at the randomly from cluster C and averages the several values of 

initiative of a client and bow these files are lata retrieved w(Si T) thereby computed in step (b) to determine the 

from the core servers when clients request them. 30 desired quantity w(Si, Q, which quantity represents the 

Since the choice of core servers to distribute a file to expected aggregate interest by the user base of proxy server 

depends on the set of users who are likely to retrieve the file Si in the target objects of cluster C 

(that is. the set of users who are likely to be interested in the In another variation, where target profile interest summa- 

canesponding target object), a separate set of core servers ries are embodied as search profile sets, the following 

and hence a separate multicast tree may be used for each 35 procedure is followed to compute w(Si, Q: (a). For each 

topical group of target objects. Throughout the description search profile P 5 in the locally stored search profile set of 

below, servers may communicate among themselves any user in the user base of proxy server Si. proxy server Si 

through any path over which messages can travel; the goal computes the distance d(P^, P c ) between the search profile 

of each multicast tree is to optimize the multicast (hstribu- and the cluster profile P c of cluster C (b). w(SiQ is chosen 

tion of files corresponding to target objects of the oorre- 40 to be the maximum value of (-dCP^cVr) across all such 

sponding topic Note that this problem is completely distinct search profiles P 5 where r is computed as an affine function 

from selecting a multiplicity of spanning trees for the of the cluster diameter of duster C The slope and/or 

complete set of interconnected nodes as disclosed by Sin- intercept of this affine function are chosen to be smaller 

coskie in IXS. Pat No. 4.706.080 and the publication titled (thereby increasing w(Si Q) for servers Si for which the 

"Extended Bridge Algorithms for Large Networks" by W. D. 45 target object provider wishes to improve performance, as 

Sincoskie and C J. Cotton, published January 1988 in TRRR may be the case if the users in the user base of proxy server 

Network on pages 15-24. The trees in this disclosure are Si pay a premium for improved performance, or if perfor- 

intentionally designed to interconnect a selected subset of mance at Si will otherwise be unacceptably low (hie to slow 

nodes in the system, and are successful to the degree mat this network connections. 

subset is relatively smalL 50 In another variation, the proxy server Si Is modified so 

Multicast Tree Construction Procedure that it maintains not only target profile interest summaries 

A set of topical multicast trees for a set of homogenous for each user in its user base, but also a single aggregate 

target objects may be constructed or reconstructed at any target profile interest summary for the entire user base. This 

time, as follows. The set of target objects is grouped into a aggregate target profile interest summary is determined in 

fixed number of topical clusters CI . . . Cp with the methods 55 the usual way from relevance fe edba c k, but the relevance 

described above, for example, by choosing CI . . . Cpto be feedback on a target object in this case, is considered to be 

the result of a k-means clustering of the set of target objects, the frequency with which users in me user base retrieved the 

or alternatively a covering set of low-level clusters from a target object when it was new. Whenever a user retrieves a 

hierarchical cluster tree of these target objects. A multicast target object by means of a request to proxy server Si. the 

tree *CT(c) is men constructed from each cluster C in CI . & aggregate target profile interest summary for proxy server Si 

. . Cp, by the following procedure: is updated. In this variation, w(SL Q I s estimated by the 

1. Given a set of proxy servers, SI ... So, and a topical following steps: 

cluster C. It is assumed that a general multicast tree (a) Proxy server Si randomly selects a target object T from 

Mljur that contains all the proxy servers SI ... Sn has cluster C 

previously been constructed by well-known methods. 65 (b) Proxy server Si applies the techniques disclosed above 

2. Each pair <Si O is associated with a weight, w(Si, Q, to its stored aggregate target prcfile interest summary in 
which is intended to covary with the expected number order to estimate the aggregate interest w(SLT) that its 
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aggregated user base had in the selected target object T. 
when new; this may be interpreted as an estimate of the 
likelihood that at least one member of the user base will 
retrieve a new target object similar to T. 
(c) Proxy server Si repeats steps (aHb) for several target 
objects T selected randomly from cluster C. and aver- 
ages the several values of w(SuT) thereby computed in 
step (b) to determine the desired quantity w(Si, Q, 
which quantity r ep re sen ts the expected aggregate inter- 
est by the user base of proxy server Si in the target 
objects of cluster C 

3. Those servers Si from among SI . . . Sn with the 
greatest weights w(Si C) are designated "core servers" 
for cluster C In one variation, where it is desired to 
select a fixed number of core servers, those servers Si 
with the greatest values of w(Si Q are selected. In 
another variation, the value of w(Si* Q for each server 
Si is compared against a fixed threshold w^, and those 
servers Si such that w(Si, Q equals or exceeds w, are 
selected as core servers. If cluster C represents a narrow 
and specialized set of target objects, as often happens 
when the dusters CI ... Q> are numerous, it is usually 
adequate to select only a small number of core server 
cluster C, thereby obtaining substantial advantages in 
computational efficiency in steps 4-5 below 

4. A complete graph 0(C) is constructed whose vertices 
are the designated core servers for cluster C For each 
pair of core servers, the cost of transmitting a message 
between those core servers along the cheapest path is 
estimated, and the weight of the edge connecting those 
core servers Is taken to be this cost. The cost is 
determined as a writable function of average transmis- 
sion charges, average transmission delay, and worst- 
case or near-worst-case transmission delay. 

5. The multicast tree MT(C) is computed by standard 
methods to be the minimum spanning tree (or a near- 
minimum spanning tree) for G(CX where the weight of 
an edge between two core servers is taken to b e the cost 
of transmitting a message between those two core 
servers. Note that MT(C) does not contain as vertices 
all proxy servers SI ... Sn, but only the core servers 
for cluster C 

6. A message M is formed describing the cluster profile 
for cluster C» the core servers for cluster C and the 
topology of the multicast tree MT(Q constructed on 
those core servers. Message Mis broadcast to all proxy 
servers SI ... Sn by means of the general mnlticast tree 
MT^. Each proxy server Si, upon receipt of message 
M, extracts the cluster profile of cluster C, and stores it 
on a local storage device, together with certain other 
information mat it determines from message M, as 
follows. If proxy server Si is named in message M as 
a core server for cluster C then proxy server Si extracts 
and stores the subtree of MT(Q induced by all core 
servers whose path distance from Si in the graph 
MT(Q is less than or equal to d, where d is a constant 
positive integer (usually from 1 to 3). If message M 
does not name proxy server Si as a core server for 
MT(C), men proxy server Si extracts and stores a list of 
one or more nearby core servers that can be inexpen- 
sively contacted by proxy server Si over virtual point- 
to-paint links. 

In the network of FIG. 3, to illustrate the use of trees, as 
applied to the system of the present invention, consider the 
following simple example where it is assumed that client r 
provides on-line information for the network, such as an 
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electronic newspaper. This information can be structured by 
client r into a prearranged form, comprising a number of 
files, each of which is associated with a different target 
object In the case of an dectronic newspaper, the files can 

s contain textual representations of stock prices, weather 
forecasts, editorials, etc The system determines likely 
demand for the target objects associated with these files in 
order to optimize the distribution of me files through me 
network N of interconnected clients p-s and proxy servers 

10 A-D. Assume that cluster C consists of text articles relating 
to the aerospace industry; further assume that the target 
assume that the target profile interest in such articles. Then 
the proxy clients p and r indicate that these users are strongly 
interested in such articles. Then the proxy servers A and B 

15 are selected as core servers for the multicast tree MT(C).The 
multicast ttee MT(C) is then computed to consist of the core 
servers, A and B, connected by an edge that represents the 
least costly virtual point-to-point link between A and B 
(either the direct path A-B or the indirect path A-C-B. 

20 depending on the cost). 

Global Requests to Multicast Trees 

One type of message that may be transmitted to any proxy 
server S is termed a "global request message." Such a 
message M triggers the broadcast of an embedded request R 

23 to all core servers in a multicast tree MT(Q. The content of 
request R and the identity of cluster C are included in the 
message M. as is a field indicating that message M is a 
global request message. In addition, the message M contains 
a field S^ which is unspecified except under certain cir- 

30 cumstances described below, when it names a specific core 
server. A global request message M may be transmitted to 
proxy server S by a user registered with proxy server S, 
which transmission may take place along a pseudonymous 
mix path, or it may be transmitted to proxy server S from 

35 another proxy server, along a virtual point-to-point connec- 
tion. 

When a proxy server S receives a message M that is 
marked as a global request message, it acts as follows: 1. If 
proxy server S is not a core server for topic C it retrieves its 

40 locally stored list of nearby core servers for topic C selects 
from this list a nearby core server S\ and transmits a copy 
of message M over a virtual point-to-point connection to 
core server S*. If mis transmission fails, proxy server S 
repeats the procedure with other core servers on its list, 2. If 

43 proxy server S is a core server for topic C, it executes the 
following steps: (a) Act on the request R that is embedded 
in message M. (b) Sct^ to be S(Q Retrieve the locally 
stored subtree of MT(Q, and extract from it a list L of all 
core servers that are directly linked to in this subtree. 

so (d) If the message M specifies a value for S^ and S^* 
appears on the list L, remove S M from the list L. Note that 
fist L may be empty before this step, or may become empty 
as a result of this step, (e) For each server Si in list L, 
transmit a copy of message M from server S to server Si over 

55 a virtual poim-to-point connection, where the S^ field of 
the copy of message M has been altered to If Si cannot 
be readied in a reasonable amount of time by any virtual 
point-to-point connection (for example, server Si is broken), 
recurse to step (c) above with S^ bound to S^ and 

60 bound to S{\sub 1} for the duration of the recursion. 

When server S* in step lor a server Si in step 2(e) receives 
a copy of the global request message M f it acts according to 
exactly the same steps. As a result all core servers eventu- 
ally receive a copy of global request message M and act on 

65 the embedded request R, unless some core servers cannot be 
reached. Even if a core server is unreachable, step (e) 
ensures that the broadcast can continue to other core servers 
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in most circumstances, provided that d>l; higher values of termed a "query message. ** When transmitted to a proxy 

d provide additional insurance against unreachable core server, a query message causes a reply to be sent to the 

servers. originator of the message; this reply will contain an answer 

Multicasting Files to a given query Q if any of the servers in a given multicast 

Hie system for customized electronic information of 5 tree MT(C) are able to answer it, and will otherwise indicate 

desirable objects executes the following steps in order to that no answer is available. The query and the cluster C are 

introduce a new target object into the system. These steps are named In the query message. Ii addition, the query message 

initiated by an entity R which may be cither a user entering a fleld s ^ikb is unspecified except under 

commands via a keyboard at a dient processor q, as iUus- circumstances described bdowTwhen it names a 

trated in FIG. 3. or an automatic software process icsidentcu „ s ^ c m when a proxy server S receives a 

achent«s^erprc«s^q. 1 Proossor q forms 1 1 signed £esss«e ^ ^ f * it acts ^ 

request R. which asks the receiver to store a copy of a file . ~ 77^ ° " . * ..^^^ " * 

F on its local storage device. Hie F, which is maintained by foUows: *- servers sets A„ to be the return actfress for 

dient q on storagf at client q or on storage accessible by * c client or server mat transrm^edmessage M to server S. 

dient q over the network, contains the informational content K 50 ™*er a network address ox a pseudonymous 

of or an identifying description of a target object as 13 address 2. If proxy server S is not a core server for cluster 

described above The request R also includes an address at Q it retrieves its locally stored list of nearby core servers for 

which entity E may be contacted (possibly a pseudonymous topic C, selects from this list a nearby core server S\ and 

address at some proxy server D)» and asks the receiver t o transmits a copy of the locate message M over a virtual 

store the fact that file F is maintained by an entity at said point-to-point connection to core server S'. If this transmis- 

address. 2. Processor q embeds request R in a message Ml. 20 sioo fails, proxy server S repeats the procedure with other 

which it rjseudcaymously transmits to the entity E's proxy core servers on its list. Upon receiving a reply, it forwards 

server D as described above Message Ml instructs proxy this reply to address A,. 3. If proxy server S is a core server 

server D to broadcast request R along an appr opriate mnl- for duster C and it is able to answer query Q using locally 

tkast tree 3. Upon receipt of message Ml, proxy server D stored information, then it transmits a "positive" reply to A r 

examines the doubly embedded file F and computes a target 23 containing the answer. 4. If proxy server S is a core server 

profile P for the corjesr^nding target object It compares the for topic C but it is unable to answer query Q using locally 

target profile P to each of the cluster profiles for topical stored information, then it carries out a paralld depth-first 

dusters CI ... Q) described above, and chooses Ck to be search by executing the following steps: (a) Set L to be the 

the cluster with the smallest similarity distance to profile R empty list, (b) Retrieve the locally stored subtree of MT(C). 

4. lYoxy server D sends itself a global request message M 30 For each server Si directly linked to S^,^ in this subtree, 

instructing itself to broadcast request R along the topical other than (if specified), add the ordered pair (Si S) to 

multicast tree MT(Ck). 5. Proxy server D notifies entity E the list L. (c) If L is empty, transmit a "negative** reply to 

through a pseudonymous communication that file Fhas been address A r saying that server S cannot locate an answer to 

multicast along the topical multicast tree for cluster Ck, query Q. and terminate the execution of step 4; otherwise 

As a result of the procedure that server D and other 35 proceed to step (d> (d) Select a list LI of one or more server 
servers follow for acting on global request messages, step 4 pairs (Ai. Bi) from the list L. For each server pair (AL Bi) 
eventually causes all core servers for topic Ck to act on on the list LI. form a locate message M(Ai. Bi). which is a 
request R and therefore store a local copy of file F. In order copy of message M whose field has been modified to 
to make room for file F on its local storage device, a core specify Bi and transmit this message M(AL Bi) to server Ai 
server Si may have to delete a less useful file There are 40 over a virtual fjoint-to-poiiit connection, (e) For each reply 
several ways to choose a file to dtVtr One option* well received (by S) to a message seat in step (d), act as follows: 
known in the art. is for Si to choose to delete the least (I) If a "positive" reply arrives to a locate message M(AL 
recently accessed file. In another variation. Si deletes a file Bi), then forward this reply to A r and terminate step 4. 
that it believes few users will access. In this variation. irnmediatdy. (ii) If a "negative" reply arrives to a locate 
whenever a server Si stores a copy of a file F. it also 45 message M(Ai. Bi). then remove the pair (Ai Bi) from the 
computes and stores the weight w(Si C r \ where C p is a list LL (in) If the message M(Ai Bi) could not be success- 
duster consisting of the single target object associated with fulfy delivered to Ai men remove the pair (Ai Bi) from the 
file F. Then, when server Si needs to delete a file, it chooses list LI, and add the pair (CL Ai) to the list LI for each Ci 
to delete the file F with the lowest weight w(Si C,). To other than Bi that is directly Imked to Aim the locally stored 
reflect the fact that files are accessed less as they age server so subtree of MT(Q. (f) Once LI no longer contains any pan- 
Si periodically multiplies its stored value of w(Si C F ) by a (Ai BO for which a message M(Ai Bi) has been sent or 
decay factor, such as 0.95, for each file F that it then stores. after a fixed period of time has elapsed, return to step (c). 
Alter natively, instead of using a decay factor, server Si may Retrieving Files from a Multicast Tree 
periodically recompute aggregate interest w(Si Cp) for each When a processor q in the network wishes to retrieve the 
file F that ft stores; the aggregate interest changes over time 35 file associated with a gives target object it executes the 
because target objects typically have an age attribute that the following steps. These steps are initiated by an entity E. 
system considers in estimating user interest, as described which may be either a user entering commands via a 
above keyboard at a client q. as illustrated in HO. 3. or an 

If entity E later wishes to remove file F from the network. automatic software process resident on a client or server 

for example because it has just multicast an updated version. 60 processor q. 1. Processor q forms a query Q mat asks 

it pseudonymously transmits a digitally signed global whether the recipient (a core server for cluster Q still stores 

request message to proxy server D, requesting all proxy a file F that was previously multicast to the multicast tree 

servers in the multicast tree MT(Ck) to delete any local copy MT(Q; if same recipient server should reply with its own 

of file F that they may be storing. server name. Note mat processor q most already know the 

Queries to Multicast Trees 65 name of file F and the identity of duster C; typically, this 

In addition to global request messages, another type of information is provided to entity E by a service such as the 

message that may be traiisrnitted to any proxy server S is news dipping service or browsing system described below. 
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which must identify files to the user by (name, multicast specific search profile sets, of measuring similarity between 

topic) pair. 2. Processor q farms a query message M that two profiles, and of updating a user's search profile set (or 

poses query Q to the wint riest tree MT(Q. 3. Processor q more generally target profile interest summary) based on 

pseudonymously transmits message M to the user's proxy what the user read, and the examples disclosed herein are 

server D, as described above. 4 Processor q receives a 5 examples of the many possible iir^ementatioas that can be 

response M2 to message M. 5. If the response M2 Is used and should not be construed to limit the scope of the 

positive" that is, it names a server S that sbH stores file F, system. 

then processor q pseudonymously irrupts the users proxy Initialize Users 1 Search Profile Sets 
server D to retrieve file F from server S. If the retrieval fails The news cupping service instantiates target profile inter- 
because server S has deleted file F since it answered the 10 est summaries as search profile sets, so that a set of high- 
query, then client q returns to step 1. 6. If the response M2 interest search profiles is stored far each user. The search 
is "peg alive" that is, it indicates that no server in MT(C) still profiles associate d with a given user change over time. As 
stores file F, then processor q forms a query Q that asks the in any application involving search profiles, they can be 
recipient for the address A of the entity that maintains file F; initially determined for a new user (or explicitly altered by 
this entity will ordinarily maintain a copy of file F indefi- is an existing user) by any of a number of procedures, indud- 
nitely. All core servers in MT(C) ordinarily retain mis ing the following preferred methods : (1) asking the user to 
information (unless instructed to delete it by the maintaining specify search profiles directly by giving keywords and/or 
entity), even if they delete file F for space reasons. numeric attributes, (2) using copies of the profiles of target 
Therefore, processor q should receive a response providing objects or target clusters that the user indicates are repre- 
address A< whereupon processor q pseudonymously 20 sentative of his or her interest (3) using a standard set of 
instructs the user's proxy server D to retrieve fik F from search profiles copied or otherwise determined from the 
address A. search profile sets of people who are demographically 

When multiple versions of a file F exist on local servers similar to the user, 
throughout the data commuiiication network N, but are not Retrieve Mew Articles from Article Source 
marked as alternate versions of the same file, the system's 25j^1 Articles are available on-line from a wide variety of 
ability to rapidly locate files similar to F (by treating mem sources. In the preferred embodiment one would use the 
as target objects and applying the methods disclosed in current days news as supplied by a news source, such as the 
"Searching far Target Objects" above) makes it possible to AP or Reuters news wire. These news articles areinputtothc 
find all the alternate versions, even if they are stored electronic niedia system by being loaded into the mass 
remotely. These related data files may men be reconciled by 30 storage system SS 4 of an information server S 4 . The article 
any method In a simple instantiation, all versions of the data profile module 201 of the system for customized electronic 
file would be replaced with tine version that had the latest identification of desirable objects can reside on the infer- 
date or version number. In another instantiation, each ver- mation server S 4 and operates pursuant to the steps illus- 
ion would be automatically «nnm»ti»H with references or tzated in the flow diagram of FIG. 5. where., as each artick 
pointers to the other versions. 35 is received at step 5M^by_Ae_infprmation server S 4 . Jhe_ 

NTPW* PI TPPTNrn <OTWirP ardcleJgpfiJ^c^^a at step 51 generates a target 

nkw;> tXlrYlNO 5>KKViUi ^ 5file fof ^ ^ stares me target p r ofle m an art icle 

The system for customized electronic identification of indexing memory (typically part of mass storage system SSj 

desirable objects of the present invention can be used in the foTIater use in selectively delivering articles to users. This 

electronic media system of FIG. 1 to implement an auto- 40 method is equally useful for selecting which articles to read 

matic news clipping service which learns to select (filter) from electronic news groups and electronic bulletin boards, 

news articles to match a user's interests, based solely on and can be used as part of a system for screening and 

which articles the user chooses to read. The system for organizing electronic mail ("e-mail"), 

customized electronic kleafification of desirable objects Calculate Article Profiles 

generates a target profile for each article mat enters the 45 A target profile is computed for each new article, as 

dectronic media system, based on the relative frequency of described earlier. The most irnportant attribute of the target 

occurrence of the words contained in the article. The system profile is a textual attribute that stands for the entire text of 

for customized electronic identification of desirable objects the article. This textual attribute is represented as described 

also generates a search profile set for each user, as a function earlier, as a vector of numbers, which numbers in the 

of the target profiles of the articles the user has accessed and 50 preferred embodiment include the relative frequencies (TP/ 

the relevance feedback the user has provided on these IDF scores) of word o cc urrences in this article relative to 

articles. As new articles are received for storage on the mass other comparable articles. The server must count the fre- 

storage systems SSj-SS m of the information servers 1,-1^. quency of occurrence of each word in the article in order to 

the system for customized electronic identification of desk- compute the TF/DDF scores. 

able objects generates their target profiles. The generated 55 These news articles are then merarchically clustered in a 

target profiles are later compared to the search profiles in the rtierarchical cluster tree at step 5*3, which serves as a 

users' search profile sets, and those new articles whose tar decision tree for detenmmng which news articles are closest 

get profiles are closest (most similar) to the closest search to the user's interest The resulting clusters can be viewed as 

profile in a user's search profile set are identified to that user a tree in which the top of the tree includes all target objects 

for possible reading. The computer program providing the 60 and branches further down the tree represent divisions of the 

articles to the user monitors how ranch the user reads (the set of target objects into successively smaller subctusters of 

number of screens of data and the number of minutes spent target objects. Each cluster has a duster profile, so mat at 

reading), and adjusts the search profiles in the user's search each node of the tree, the average target profile (centroid) of 

profile set to more closely match what the user apparently all target objects stored in the subtree rooted at that node is 

prefers to read. The details of the method used by this system 65 stored. This average of target profiles is computed over the 

are disclosed in flow diagram form in FIG. 5. This method representation of target profiles as vectors of numeric 

requires selecting a specific method of calculating user- attributes, as described above. 



06/20/2003, EAST Version: 1.04. 0000 



5,754,938 

57 58 

Compare Current Articles' Target Profiles to a User's Search d(P 5 . P r )<t a threshold, 7. ff S contains only one search 

Profiles profile and T contains only one target profile, declare a 

The process by which a user employs this apparatus to match between that search profile and that target profile. 8. 

retrieve news articles of interest is illustrated in flow dia- Otherwise recurse to step 1 to find all matches between 

gram form in FIG. 11. At step 1101* the user logs into the 5 search profiles in tree S and target profiles in tree T. 

data communication network K via their client processor C 2 The threshold used in step 6 is typically an affine function 

and activates the news reading program. This is accom- or other function of the greater of the cluster variances (or 

pushed by the user establishing a pseudonymous data com- cluster diameters) of S andT. Whenever a match is declared 

imini rations connection as described above to a proxy server between a search profile and a target profile, the target object 

S^, which provides front-end access to the data communi- 10 that contributed die target profile is identified as being of 

cation network N. The proxy server S 2 maintain* a list of interest to the user who contributed the search profile. Notice 

authorized pseudonyms and their corresponding public keys that the process can be applied even when the set of users to 

and provides access and billing controL The user has a be considered or the set of target objects to be considered is 

search profile set stored in the local data storage medium on very small. In me case of a single user, the process reduces 

the proxy server S 2 . When the user requests access to "news** 15 to the method given for identifying articles of interest to a 

at step 1102. the profile matching module 213 resident on single user. In the case of a single target object, the process 

proxy server S3 sequentially considers each search profile p A constitutes a method for identifying users to whom that 

from the user's search profile set to determine which news target object is of interest 

articles are most likely of interest to the user. The news Present List of Articles to User 

articles were automatically clustered into a hierarchical 20 Once the profile correlation step is completed for a 

duster tree at an earlier step so that the ctetennination can be selected user or group of users, at step 1164 the profile 

made rapidly for each user. The tf erarchical cluster tree processing module 2§3 stores a list of the identified articles 

serves as a decision tree far determining which articles* for presentation to each user. At a user's request, the profile 

target profiles are most similar to search profile p*: the processing system 2t3 retrieves the generated list of rdevanl 

search for relevant articles begins at the top of the tree, and 25 articles and presents this list of titles of the selected articles 

at each level of the tree the branch or branches are selected to the user, who can then select at step 11 15 any article for 

which have cluster profiles closest to p*. This process is viewing. (If no titles are available, then the first sentences) 

recursively executed until the leaves of the tree are reached. of each article can be used.) The list of article tides is sorted 

identifying individual articles of interest to the user, as according to the degree of similarity of the article's target 

described in the section Searching for Target Objects" 30 profile to the most similar search profile in the user's search 

above. profile set The resulting sorted list is either transmitted in 

A variation on this process exploits the fact that many real time to the user client processor Q, if the user is present 

users have similar interests. Rather than carry out steps 5-9 at their client processor C t , or can be transmitted to a user's 

of the above process separately for each search profile of mailbox, resident on the user's client processor C t or stored 

each user, it is possible to achieve added efficiency by 35 within the server S 2 for later retrieval by the user, other 

carrying out these steps only once for each group of similar methods of transmission include facsimile transmission of 

search profiles, thereby satisfying many users' needs at the printed list or telephone transmission by means of a 

once. In this variation, the system begins by non- text-to-speech system. The user can then transmit a request 

hierarchically clustering all the search profiles in the search by computer, facsimile, or telephone to Indicate which of the 

profile sets of a large number of users. For each cluster k of 40 identified articles the user wishes to review, if any. The user 

search profiles, with cluster profile P*, it uses the method can still access all articles in any information server S 4 to 

described in the section "Searching for Target Objects" to which the user has authorized access, however, those lower 

locate articles with target profiles similar to p^ Each located on the generated Hst are simply further from the user's 

article is then identified as of interest to each user who has interests, as determined by the user's search profile set The 

a search profile represented in cluster k of search profiles. 45 server retrieves the article from the local data storage 

Notice that the above variation attempts to match dusters medium or from an information server S 4 and presents the 

of search profiles with similar clusters of articles. Since this article one screen at a time to the user* s client processor C v 

is a symmetrical problem, fx may instead be given a sym- The user can at any time select another article for reading or 

metrical solution, as the following more general variation exit the process, 

shows. At some point before the matching process so Monitor Which Articles Are Read 

continences, all the news articles to be considered are The user's search profile set generator 292 at step 1107 

clustered into a hierarchical tree, termed the "target profile monitors which articles the user reads, keeping track of how 

cluster tree." and the search profiles of all users to be many pages of text are viewed by the user, how much time 

considered are clustered into a second hierarchical tree. is spent viewing the article, and whether all pages of the 

termed the "search profile cluster tree." The following steps 55 article were viewed. This information can be combined to 

serve to find all matches between inctividual target profiles measure the depth of the user's interest in the article, 

from any target profile duster tree and individual search yielding a passive relevance feedback scare, as described 

profiles from any search profile cluster tree: 1. For each child earlier. Although the exact details depend on the length and 

subtree S of the root of the search profile cluster tree (or, let nature of the articles being searched, a typical formula might 

S be the entire search profile cluster tree if it contains only 60 be: 

one search profile): 2. Oanpute the cluster profile P* to be meatme rf ^ miM & mM tf fe^^b 

the average of all search profiles in subtree S 3. Per each vceuedKU if tfl pf« *» •comd+oa if mare (ten 30 see- 

subclusters (child subtree) T of the root of the target profile <mfe was spent on tho Mtic io+OJ if mote tbm 000 mfmrrr w» 

cluster tree (or, let T be the entire target profile cluster tree ^^^J^^^ spcsA m ^ c m 

if it contains only one target profile): 4. Ounpute the cluster 65 grwtotaiaiffce camber page*. 

profile P r to be the average of all target profiles in subtree The computed measure of article attractiveness can then 

T 5. Calculate d(P^ Pp. the distance between P, and P T 6. If be nsed as a weighting function to adjust the user's search 
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profile set to thereby more accurately reflect the user's 
dynamically changing interests. 
Update User Profiles 

Updating of a user's generated search profile set can be 
done at step 1106 using the method described in copending 
VS. patent application Sen No. 08/346,425. When an article 
Is read, the server shifts each search profile in the set 
slightly in the direction of the target profiles of those nearby 
articles for which the computed measure of article attrac- 
tiveness was high. Given a search profile with attributes u a 
from a user's search profile set, and a set of J articles 
available with attributes d^ (assumed correct for now), 
where I indexes users* j indexes articles, and k indexes 
attributes, user I would be predicted to pick a set of P distinct 
articles to minimi?* the sum of bj) over the chosen 
articles j. The user's desired attributes u tt and an article's 
attributes d^ would be some form of word frequencies such 
as TF/IDF and potentially other attributes such as me source, 
reading level, and length of (he article, while d(u^ d,) is the 
distance between these two attribute vectors (profiles) using 
the similarity measure described above. If the user picks a 
different set of P articles than was predicted, the user search 
profile set generation module should try to adjust u anoVar to 
more accurately predict the articles the user selected. In 
particular, u, and/or d, should be shifted to increase their 
shrrilarity if user I was predicted not to select article j but did 
select it and perhaps also to decrease thdr similarity if user 
I was predicted to select article but did not A preferred 
method is to shift u for each wrong prediction that user I will 
not select article j. using the formula: u A ' s ^-*0>ifc dp) 

Here u, is chosen to be the search profile from user I's 
search profile set that is closest to target profile. If e is 
positive, this adjustment increases the match between use- 
rs search profile set and the target profiles of the articles 
user I actually selects, by malting u, closer to d, for the case 
where the algorithm failed to predict an article that the 
viewer selected. The size of e determines how many 
example articles one must see to change the search profile 
substantially. If c is too large, the algorithm becomes 
unstable, but for sufficiently small e, it Waives u to its correct 
value. In general, e should be proportional to the measure of 
article attractiveness; for example, it should be relatively 
high if user I spends a long time reading article j. One could 
In theory also use the above formula to decrease the match 
in the case where the algorithm predicted an article that the 
user did not read, by mating e negative in mat case. 
However, mere is no guarantee that u will move in the 
correct direction in that case. One can also shift the attribute 
weights w, of user I by using a similar algorithm: w^Kw a - 

dUtt-^yvz^Wtt^tt-oy) 

This is particularly important if one is combining word 
frequencies with other attributes. As before, this increases 
the match if e is positive — for the case where the algorithm 
failed to predict an article that the user read, this time by 
decreasing the weights on those characteristics for which the 
user's target profile u, differs from the article's profile d. 
Again, the size of e determines how many example articles 
one must see to replace what was originally believed. Unlike 
the procedure for adjusting u, one also make use of the fact 
that the above algorithm decreases the match if c is 
negative — for the case where the algorithm predicted an 
article that the user did not read. The denominator of the 
expression prevents weights from shrinking to zero over 
time by ^normalizing the modified weights w/ so that they 
sum to one. Both u and w can be adjusted for each article 
accessed. When e is small, as it should be, there is no conflict 
between the two parts of the algorithm. The selected user's 
search profile set is updated at step 1118. 



Further Applications of the Filtering Technology 

The news cupping service may deliver news articles (or 
advertisements and coupons for purchasables) to off-line 
users as well as to users who are on-line. Although the 

5 off-line users may have no way of providing relevance 
feedback, the user profile of an off-line user U may be 
siTTiflaT to the profiles of on-line users, far example because 
user U is dernographicalry similar to these other users, and 
the level of user ITs interest in particular target objects can 

10 therefore be estimated via the general interest-estimation 
methods described earlier. In one application, the news 
clipping service chooses a set of news articles (respectively, 
advertisements and coupons) that are predicted to be of 
interest to user U, thereby determining the content of a 

is customized newspaper (respectively, advertising/coupon 
circular) that may be printed and physically sent to user U 
via other mrthods In general, the target objects included in 
the printed document delivered to user U are those with the 
highest median predicted interest among a group G of users, 

20 where group G consists of either the single off-line user U, 
a set of off-line users who are dernographicalry similar to 
user U, or a set of off-line users who are in the same 
geographic area and thus on the same newspaper delivery 
route. In a variation, user group G is clustered into several 

23 subgroups Gl . . . Gfc; an average user profile Pi is created 
from each subgroup Gi; for each article T and each user 
profile Pi, the interest in T by a hypothetical user with user 
profile Pi is predi c ted, and the interest of article T to group 
G is taken to be the maximum interest in article T by any of 

30 these k hypothetical users; finally, the customized newspa- 
per for user group G is constructed from those articles of 
greatest interest to group G. 

The filtering technology of the news clipping service Is 
not limited to news articles provided by a single source, but 

33 may be extended to articles or target objects collected from 
any number of sources. For example, rather than identifying 
new news articles of interest, the technology may identify 
new or updated World Wide Web pages of interest. In a 
second application, termed "broadcast clipping.** where 

40 individual users desire to broadcast messages to all inter- 
ested users, the pool of news articles is replaced by a pool 
of messages to be broadcast, and these messages are sent to 
the broaccast-d^ping-servlce subscribers most interested in 
them. In a third application, the system scans the transcripts 

45 of all real-time spoken or written discussions on the network 
that arc currently in progress and *w*ig«*tf*t as public, and 
employs the news^upping technology to rapidly identify 
discussions that the user may be interested in joining, or to 
rapidly identify and notify users who may be interested in 

so joining an ongoing discussion. In a fourth application, the 
method is used as a post-process that filters and ranks in 
order of interest the many target objects found by a con- 
ventional database search, such as a search for all homes 
selling for under 5200,000 in a given area, for aH 1994 news 

55 articles atout Marcia Clark, or for aUfialian-language films. 
In a fifth application, the method is used to filter and rank the 
links in a hypertext document by estimating the user's 
interest in the document or other object associated with each 
link. In a sixth application, paying advertisers, who may be 

60 companies or individuals, are the source of advertisements 
or other messages, which take the place of the news articles 
in the news cupping service. A consumer who buys a 
product is deemed to have provided positive relevance 
feedback on advertisements for that product and a consumer 

65 who buys a product apparently because of a particular 
advertisement (for example, by using a coupon clipped from 
that advertisement) is deemed to have provided particularly 
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high relevance feedback on that advertisement Such feed- cally taken by the system. The same filter may be applied to 

back may be communicated to a proxy server by me voice mail messages or facsimile messages that have been 

consumer's client processor (if the consumer is making the converted into dectronically stored text* whether auto mart- 

purchase dectronically), by the retail vendor, or by the cally or at the user's request via the use of well-known 

credit-card reader (at the vendor's establishment) that the 5 techniques for speech recognition or optical character rec- 
consumer uses to pay for the purchase. Given a database of ognitioo. 

such relevance feedback, the disclosed technology is men The filtering problem can be defined as follows: a mes- 

used to match advertisements with those users who are most sage processing function MPF(*) maps from a received 

interested in them; achrertiscments selected for a user are message (document) to one or more of a set of actions. The 

presented to that user by any one of several means, including 10 actions, which may be quite specific, may be timer pre- 

electronic mail, automatic display on the users screen, or defined or customized by the user. Each action A has an 

printing them on a printer at a retail establishment where the appropriateness function F A (* *) such that F A (U J>) returns 

consumer is paying for a purchase. The threshold distance a real number, representing the ap propr i ateness of selecting 

used to identify interest may be increased for a particular action A on behalf of user U when user U is in receipt of 

advertisement, causing the system to present that advertise- 15 message D. Far example, if D comes from a credible source 

meet to more users, in accordance with the amount that the and is marked urgent, then discarding the message has a high 

advertiser is willing to pay. cost to the user and has low appropriateness, so that F aiJcmrf 

A further use of the capabilities of this system is to (UX>) is small, whereas alerting the user of receipt of the 

manage a user's investment portfolio. Instead of recom- message Is higfrfy ap prop ri ate, so that F^^ (UJ>) is large 

mending articles to the user, the system recommends target 20 Given me determined appropriateness function, the function 

objects that are investments. As illustrated above by the MPF(D) is used to automatically select the appropriate 

example of stock market investments, many different action or actions. As an example, the following set of actions 

attributes can be used together to profile each investment might be useful: 

The user's past investment behavior is characterized in the i. Urgently notify user of receipt of message 

us^'s search profile set or target proffle interest amimary 23 2 . Insert message into queue for user to read later 

and this uiformanon is used to match the user with stock ^ , _ . . . , . 

OH>ortumtie S (targ^ 3 **** for user to read later, and 

meats. The rapid profiling method described above may be suggest mat user reply 

used to determine a rough set of preferences for new users. Insert message into queue for user to read later, and 

QuaKtyattriTHnes used m this system cm indtide Degatrvely 30 suggest that user forward it to individual R 

weighted attributes, such as a measurement of fluctuations in 5. Summarize message and insert summary into queue 

dividends historically paid by the investment a quality 5. Forward message to user's secretary 

attribute that would have a strongly negative weight for a ? Fllc n*^^ m Sectary X 

conservative investor dependent on a regular flow of invest- e . __ 

rnetrtiiiamx.FurtrKimore.te 33 rue message in directory 1 

so mat the system can monitor stock prices and automati- 9 Delete message (i.e., ignore message and do not save) 

cally take certain actions, such as placing buy or sell orders, 10. Notify sender that further messages on this subject are 

or paging the user with a notification, when certain stock unwanted 

performance characteristics are met. Thus, the system can Notice mat actions 8 and 9 in the sample list above are 
immediately notify the user when a selected stock reaches a 40 designed to filter out messages that are undesirable to the 
predetermined price, without the user having to monitor the user or that are received from undesirable sources, such as 
stock market activity. The user's investments can be profiled pesky salespersons, by deleting the unwanted message and/ 
in part by a "type of investment** attribute (to be used in or sending a reply that indicates that messages of this type 
conjunction with other attributes), which distinguishes will not be read. T*he a ppropr i ateness functions must be 
among bonds, mutual funds, growth stocks, income stocks, 45 tailored to describe the ap propriateness of carrying out each 
eta, to thereby segment the user's portfolio according to action given the target profile for a particular document and 
investment type. Each investment type can then be managed then a message processing function MPF can be found 
to identify mvestment opportunities and the user can identify which is in some sense optimal with respect to the appro- 
Ac desired ratio of investment capital for each type. priateness function. One reasonable choice of MPF always 
B-maH Filter 30 picks the action with highest appropriateness, and in cases 
In addition to the news cupping service described above, where multiple actions are highly appropriate and are also 
tite system for customized electronic identification of desir- corrapatible with each other, selects more than one action: for 
able objects functions in an eVjnail environment in a example, it may automatically reply to a message and also 
similar but slightly different manner. The news clipping file the same message in directory X, so mat the value of 
service selects and retrieves news information that would 55 MPP[D) is the set \{reply, file in directory X\}. In cases 
not otherwise reach its subscribers. But at the same time, where the appropriateness of even the most appropriate 
large numbers of e-mail messages do reach users, having action falls below a user-specified threshold, as should 
been generated and sent by humans or automatic programs. happen for messages of an unfamiliar type, the system asks 
These users need an e-mail filter, which automatically the user for confirmation of the action(s) selected by MPF. 
processes the messages received. The necessary processing 60 In addition, in cases where MPF selects one action over 
includes a determination of the action to be taken with each another action that is nearly as appropriate, the system also 
message, including, but not limited to: filing the message, asks the user for confirmation: for example, mail should not 
notifying the user of receipt of a high priority message, be deleted if it is nearly as appropriate to let the user see it 
automatically responding to a message. The e-mail filter It Is possible to write appropriateness functions manually, 
system must not require too great an liwestment on the part 65 but the time necessary and lack of user expertise ren der this 
of the user to learn and use, and the user must have solution ImpracticaL The automatic training of this system is 
confidence in the approfiiateness of the actions automati- preferable, using the automatic user profiling system 
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described above. Each received Hnrni^nt is viewed as a objects into coherent dusters provides an efficient method 

target object whose profile includes such attributes as the whereby the user can locate a target object of Interest The 

entire text of d» document (represented as TF/IDF scores), user first chooses one of the highest level (largest) clusters 

document sender, date sent document length, date of last from a. menu, and is presented with a menu listing the 

document received from this sender, key words, list of other 5 subdusters of said cluster, whereupon the user may select 

addressees, etc B was disclosed above how to estimate an one of these subdusters. The system locates the subcluster, 

interest function on profiled target objects, using relevance v j a ^ appropriate pointer that was stored with the larger 

feedback together wifli measured s imil a rities among target duster, and allows the user to select one of its subdusters 

objects and among users. In the context of the e-mail filter. ^ another menu. This process is repeated until the user 

toe task is to estimate several appropr^ness funcbons w ^ rf ^ ^ ^Tme details of an 

Fa( • >' ^l Per 8Ctl0n - ™*Ji actual target object Hierarchical trees allow rapid selection 

same method as was i used eadier Oe topical of one J£ ^ ^ a ^ ^ u ^ a *£ ldMb>m 

inters t functton *)J^tevan« ; feedback m disease is ^ menns rf teo ltems (subcturtm) each, one can reach 

rawidedby the user s observed actions ^er tune^wbeneya I0 ,0 =10,00a000.000 (ten billion) items. In the preferred 

userU chooses acton A on dtwiment D. e^er fi«ry or^by ]} cmbc<1 i^ t . &eu ^ vJ ^ sthcn ^ S onaconi^ 

choosing or confirming an action recommended by the aU rmii^s^ a ^dxiccts bxxath^wibakryboaTdor 

system, tin* i is taken to mean that the approjaoataess of However, the user may also make selections over the 

action A on document D is hi^ piuticiilarly if the user takes ^ a voice synthesizer reading the menus and 

this action A imme diately after seeing document D A ^ ^ seleetin ^ subdusters yj. the telenhone's touch-tone 

presumption of no ar^ropnateness (c^rt^dmg to the M ^ ^ sjmultaiH ^ ^ 

earlier presumption of no interest) is used so that action Ais tainstwo connections to the server, a telephone voice 

considered Inappropriate on a document unless the user or oooncctioD md , ^ co^ec*^ ^ xnds successive 

similar users have taken action A on tois document or sumLr ulhensabyta|WU]eltKUSer selects choices via 

documents. In particular, if no similar document has been a* telephone's toucb-tooe keypad, 

seen, no actka is considered especially appropriate, and the ^ J( ^£ ^ c^nn^y j„dudc an associative 

e-mailfilter asks the user to specify the appropriate action or attribute indicating the user's degree of interest in each 

confirm that the action chosen by the e-mail filter is the ^ it * uscfill to ^g^* usa profit whn ^ 

appropnate onc * additional associative attribute UKticating the user's degree 

Thus, the e-mail fitter learns to take particular actions on of i,,^ in each cluster in me hierarchy 
e^naUmessages that have certain attributes OT combinations M ^ of interest ^ ^ estimated numerically as the 
of attribute* F<* example, messages torn John Doe that number of subctusters ox target objects the user has selected 
originate in the £12) area code may prompt the systemto ^ ^ me glveD duster or its 
forward a copy by fax transmission to a given fax number subdastas< expressed as a proportion of the total number of 
or to file the message in directory X on the user s ckent oTtarget objects the user has selected. This 
processor. A variation allows acttye requests of this form „ associa ^ c attribute is particularly valuable if the hierarchi- 
fromthe user, such as a request that any message frcmJohn ^ was buflt using^ofr or "fuzzy" dustering. which 
Doe be forwarded to a desired fax number until further allows a subcluster or target object to appear in multiple 
nonce. This acttveuser inpiB reqinres *e use of a natural dusteni: tf „ w docJiux appears in bah the "sports- 
language or fcon-b^ toerface for wWch specific com- ^ mc - baool ' dusters, and the user selects it from a menu 
mauds are associated with particular attributes and combi- w associaSed ^ me "humor- cluster, then the system 
natrons of attributes. Increases its association between the user and the "humor" 
Update Notification . . ^ duster but not its association between the user and the 

A very important and novel characteristic of the architcc- "snorts" duster 

*Z £ T^Z^d 1 I^dLg Clusters 

that are relevant to me user, as determined by the user s 4J a ok, who is navigating the cluster tree is repeat- 

searc* profile set or urget profile interest summary. M expected to ^ onTofseveral subctusters froma 

rUpoated target ejects iDctode iwrtse^onsof dow- ^ oW subdusters must be usefully labeled (at step 

ments and new models of purchasable goods.) The system 5*3), in such a way as to suggest their content to the human 

may notify _tfxuser of these relevant target objects b) -an ftisstraigbtforwamtoinc^ 

dectromc notification such as an e-mail message or fac- x about ^ 5,^^,,^ m to UM< snch as the number of 

simile transmission. In the variation where the system sends ^et objects the subcluster contains (possibly just 1) and 

an e-mail message, the user se-mail filter can then respond the niimbeTof these that have been added or updated 

appropriately to the notification, for instance, by bringing recently. However, it is also necessary to display additional 

the notification immediately to the user s personal attenbon, iafon ^ mon ^ ^ duster's content This 

a ty automatically ^bmutmgan dectron^request to 5J conlemHtoeriptive information may be provided by a 

F ! nC ^ M * <?L ta ?L!*? eCt ^ " odftcat i^ u A human, particularly for large or frenuently accessed clusters, 

simple example of the latter response is for the e-mail filter to, U may also be generated automatically. The basic 

to retrieve an on-line document at a nominal or zero charge, automaflc actziquc isslmpry to display the duster's "char- 

orre^e^tobuyaDuxcnasabkofliniit^ acteristic value" for eachTa few highly weighted attributes, 

a used product or an actionable. eo With numeric attributes, this may be taken to mean the 

ACTIVE NAVIGATION (BROWSING) cluster's average value for that attribute: thus, if the "year of 

Browsing by Navigating Through a Cluster Tree release" attribute is highly weighted in predicting which 

A hierarchical duster tree imposes a useful organization movies a user will like, then it is useful to display average 

on a collection of target objects. The tree is of direct use to year of release as part of each cluster's label. Thus the user 

a user who wishes to browse through all (he target objects in SS sees that one cluster consists of movies that were released 

the tree. Such a user may be exploring the collection with or around 1962. while another consists of movies from around 

without a well-specified goal. The tree's division of target 1982. For short textual attributes, such as "title of movie* or 
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"title of document.** the system can display the attribute's nates of that option's numeric vector along said axes. Step 
value for the duster member (target object) whose profile is (3) may be varied to determine a set of. say. 6 axes* so that 
most similar to die cluster's profile (the mean profile for all step (4) lays out the options in a 6-dimensional space; in this 
members of the cluster), for example, the title of the most case the user may view the geometric projection of the 
typical movie in the cluster. For longer textual attributes, a 5 6-dimcnsioiial layout onto any plane passing through the 
useful technique is to select those terms for which the origin, and may rotate this viewing plane in order to see 
amount by which the term's average TT7IDF score across differing configurations of the options* which emphasize 
members of the cluster exceeds the term's average TF/TDF similarity with respect to differing attributes in the profiles 
score across all target objects Is greatest, either in absolute of the associated clusters. In the visual representation, the 
terms or else as a fraction of the standard deviation of the 10 sizes of the duster labels can be varied according to the 
term's TF/TDF score across all target objects. The selected number of objects contained in the corresponding dusters, 
terms are replaced with their morphological stems, elimi- In a further variation, all options from the parent menu are 
nating duplicates (so mat if both "slept" and "sleeping" were displayed in some number of dimensions, as just described, 
selected, they would be replaced by the single term "sleep") but with the option corresponding to the current menu 
«na rptinnnliy rtimlrmHng rirw. synonym* or cnTlnrfite* (sn 13 replaced by a more prominent subdisplay of the options on 
that if both "nurse" and "medical** were selected, they might the current menu; optionally, the scale of this composite 
both be replaced by a single term such as "nurse," display may be gradually increased over time, thereby 
"medical," "medicine," or 'hospital"). The resulting set of increasing the area of the screen devoted to showing the 
terms is displayed as part of the labeL Fiaally, if freely options on the current menu, and giving the visual impres- 
redistributable thumbnail photographs or other graphical 20 skm that the user is regarding the parent cluster and "zoom- 
images are associated with some of the target objects in the ing in" on the current cluster and its subdusters. 
cluster for labeling purposes, then the system can display as Further Navigational 

part of the labd the image or images whose associated target It should be appreciated that a hierarchical duster-tree 
objects have target profiles most similar to the cluster may be configured with multiple cluster selections branch- 
profile. 23 ing from each node or the same labeled clusters presented in 

Users' navigational patterns may provide some useful the form of single branches for multiple nodes ordered in a 
feedback as to the quality of the labels. In particular, if users hierarchy. In one variation, the user is able to perform lateral 
often select a particular cluster to explore, but then quickly navigation between neighboring dusters as well, by request- 
backtrack and try a different cluster, this may signal mat the ing that the system search for a cluster whose duster profile 
first cluster's labd is misleading. Insofar as other terms and 30 resembles the cluster profile of the currently s el e cted duster, 
attributes can pro vide "next-best" alternative labels for the If this type of navigation is performed at the levd of 
first duster, such "next-best" labels can be automatically individual objects (leaf ends), then automatic hyperlinks 
substituted for the misleading label. In addition, any user can may be then created as navigation occurs. This is one way 
locally relabel a cluster for his or her own convenience. that nearest neighbor dustering navigation may be per- 
Although a duster labd provided by a user is in general 33 formed. For example, in a domain where target objects are 
visible only to that user, it is possible to make global use of home pages on the World Wide Web. a collection of such 
these labels via a "user labels" textual attribute for target pages could be laterally linked to create a "virtual malL" 
objects, which attribute is defined for a given target object to The simplest way to use the automatic menuing system 
be the concatenation of all labd s provided by any user for described above is for the user to begin browsing at the top 
any duster containing mat target object. This attribute 40 of the tree and moving to more specific subdusters. 
influences similarity judgments: for example, it may induce However, in a variation, the user optionally provides a query 
the system to regard target articles in a duster often labeled consisting of textual and/or other attributes, from which 
"Sports News" by users as being mildly similar to articles in query the system constructs a profile in the manner 
an otherwise dissimilar cluster often labeled "International described herein, optionally altering textual attributes as 
News" by users, precisely because the "user labds" attribute 45 described herein before decomposing them into numeric 
in each duster profile is strongly associated with the term attributes. Query profiles are similar to the search profiles in 
"News," The "user label" attribute is also used in the a user's search profile set, except that their attributes are 
automatic generation of labels, just as ether textual attributes explicitly specified by a user, most often for one-time usage, 
are. so that if the user-generated labels for a cluster often and unlike search profiles, they are not automatically 
indude "Sports," the term "Sports** may be included in the » updated to reflect changing interests. A typical query in the 
automatically generated label as weU. domain of text articles might have Tell me about the 

It is not necessary for menus to be displayed as simple relation between Galileo and the Media family" as the value 

lists of labeled options; it is possible to display or print a of its "text of article" attribute, and 8 as the value of its 

menu in a form that shows in more detail the relation of the "reading difficulty" attribute (that is, 8th-grade levd). The 

different menu options to each other. Thus, in a variation, the 35 system uses the method of section 'Searching for Target 

menu options are visually laid out in two dimensions or in Objects'* above to automatically locate a small set of one or 

a respective drawing of three dimensions. Each option is more clusters with profiles similar to the query profile, for 

displayed or printed as a textual or graphical label The eramplr, the articles they contain are written at roughly an 

physical coordinates at which the options are displayed or Stb-grade tevd and tend to mention Galileo and the Medids. 

printed are generated by the following sequence of steps: (1) 60 The user may start browsing at any of these dusters, and can 

construct for each option the duster profile of the duster it move from it to subdusters, superchisters. and other nearby 

represents. (2) construct from each duster profile its decam- clusters. For a user who is looking for something in 

position into a numeric vector, as described above, (3) apply particular, it is generally less efficient to start at the largest 

singular value decomposition (SVD) to determine the set of duster and repeatedly select smaller subdusters than it is to 

two or three orthogonal linear axes along which these 63 write a brief description of what one is looking for and then 

numeric vectors are most greatly differentiated, and (4) take to move to nearby dusters if the objects initially recom- 

the coordinates of each option to be the projected coordi- mended are not precisely those desired. 
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Although it is customary in information retrieval systems menu presented to the user for the user's navigation need not 
to match a query to a document an interesting variation is be exactly isomorphic to the cluster tree. The menu is 
possible where a query is matched to an already answered typically a somewhat modified version of the cluster tree, 
question. The relevant domain is a customer service center, reorganized manually or automatically so ***** the clusters 
electronic newsgroup, or Better Business Bureau where 5 most interesting to a user are easily accessible by the user, 
questions art frequently answered. Each new question- to automatically reorganize the menu in a user- 
answer pair is recorded for future reference as a target specific way, (he system first attempts automatically to 
object, with a textual attribute that specifies the question identify existing clusters thai are of interest to me user. The 
together with the insw premded. As explained ^her with sy^^ identify a cluster as interesting because the user 
reference to document titles, the question should be a. f ^l* «ku^ :„ *w ,j.7^ ~ . _ 
weighted more heavily than the answer when this textual 10 ^ ^^J^T^ ^ t 
attribute is decomposed intoTF/IDF scores. A query sped- ^stiaaed v^on bec^se the user is predicted to have 

tying ^cllmelbo^rtmc relation between Gaiflco andAc m *^**^ US £ r 5 

Medici family" as the value of mis attribute therefore locates disclosed herein for estimating interest from relevance feed- 

a cluster of similar questions together with their answers. In back * 

a variation, each question-answer pair may be profiled with 15 Several techniques can then be used to make interesting 

two separate textual attributes, one for the question and one clusters more easily accessible. The system can at the user's 

for the answer. A query might then locate a duster by request or at all times display a special list of the most 

specifying only the question attribute, or for completeness, interesting clusters, or the most interesting siibdusters of the 

both the question attribute and the (lower-weighted) answer current duster, so that the user can select one of these 

attribute, to be the text Tell me about the relation between 20 clusters based on its label and jump directly to it In general. 

Galileo and the Media family." when the system constructs a list of interesting dusters in 

The filtering technology described earlier can also aid the this way, the f* most prominent choice on the list, which 
user in navigating among the target objects. When the choice is denoted Top(I), is found by considering all appro- 
system presents the user with a menu of subctusters of a priate clusters C that are further than a threshold distance t 
cluster C of target objects, it can simultaneously present an 25 from all of Tbp(l), Top(2>. . . . Tbp(I-l), and selecting the 
additional menu of the most interesting target objects in one in which the user's interest is csrimafrd to be highest 
cluster C. so that the user has me choice of accessing a Here the threshold distance t is optionally dependent on the 
subduster or directly accessing one of the target objects. If computed cluster variance or cluster diameter of the profiles 
this additional menu lists n target objects, then for each I in the latter cluster. Several techniques that reorganize the 
between 1 and n indusive, in increasing order, the I** most 30 hierarchical menu tree are also useful. First menus can be 
frominent choice on this additional menu, which choice is reorganized so that the most interesting subduster choices 
denoted lbp(Ci), is found by considering all target objects appear earliest on the menu, or are visually marked as 
in cluster C that are further than a threshold distance t from interesting; for example, their labds are displayed in a 
all of Tbp<Cl), Top(C2>. . . . Top(C, 1-1), and selecting the special color or type face, or are displayed together with a 
one in which the user's interest is estimated to be highest If *s number or graphical image indicating the likdy level of 
the threshold riff?*™** t is 0, then the menu resulting from interest Second, interesting clusters can be moved to menus 
this procedure simply displays the nmost interesting objects higher in the tree. it, doser to the root of the tree, so that 
in cluster C but the threshold distance may be increased to they are easier to access if the user starts browsing at the root 
achieve more variety in the target objects displayed. Gen- of the tree. Tfurd, uninteresting dusters can be moved to 
eralty the threshold t is chosen to be an affine 40 menus lower in the tree, to make room for interesting 
function or other function of the cluster variance or cluster clusters that are being moved higher. Fourth, clusters with an 
diameter of the duster C especially low interest score (representing active dislike) can 

As a novelty feature, the user U can "masquerade** as simply be suppressed from the menus; thus, a user with 
another user V v such as a prominent intellectual or a celebrity children may assign an extremely negative weight to the 
supermodel; as long as user TJ is masquerading as user V, the 45 * 4 vulgarify H attribute in the determination of q, so that vulgar 
filtering technology wlO recommend articles not according clusters and documents will not be available at all. As the 
to user ITs preferences, but rather according to user V's interesting clusters and the documents in them migrate 
preferences. Provided mat user U has access to the user- toward the top of the tree, a customized tree devdops that 
specific data of user V, for example because user V has can be more efficiently navigated by the particular user, tf 
leased these data to user U for a finandal consideration, then 50 menus are chosen so that each menu item is chosen with 
user U can masquerade as user V by instructing user U's approximately equal probability, then the expected number 
proxy servers to tenn^onriry substitute user V's user profile of choices the user has to make is minfmlrrd It for 
and target profile interest summary for user LPs. In a example, a user frequently acc e ssed target objects whose 
variation, user U has access to an average user profile and an profiles resembled the duster profile of duster (a. b. d) in 
composite target profile interest summary for a group G of 35 FIG. 8 then the menu in FIG. 9 could be modified to show 
users; by instructing proxy server S to substitute these for the structure illustrated in FIG. It. 
user ITs user-specific data, user U can masquerade as a In the variation where the general techniques disclosed 
typical member of group G, as is useful in exploring group herein for estimating a user's interest from relevance feed- 
preferences for sorictogkal, political, or market research. back are used to Identify interesting clusters, it is possible 
More generally, uter U may "partially masquerade" as 60 for a user U to supply "temporary relevance feedback** to 
another user V or group G. by instructing proxy server S to indicate a temporary interest that is added to his ox her usual 
ternporarily replace user ITs user-specific data with a interests. This is done by entering a query as described 
weighted average of user ITs user-specific data and the above, Le., a set of textual and other attributes that dosefy 
user-specific data for user V and group G. match the user's interests of the moment This query 
Menu Organization 65 becomes "active,** and affects the system's determination of 

Although the topology of a hierarchical cluster tree is interest in either of two ways. In one approach, an active 

fixed by the techniques mat build the tree, the hierarchical query is treated as if it were any other target object and by 
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virtue of being a quay, it is taken to have received relevance 
feedback that indicates especially high interest In an alter- 
native approach, target objects X whose target profiles are 
similar to an active query's profile are simply considered to 
have higher quality q(U, X), in that q(U, X) is incremented 
by a term that increases with target object X's similarity to 
the query profile. Either strategy affects the usual interest 
estimates: clusters that match user U*s usual ip**?**!? (and 
have high quality q(*)) are snU considered to be of interest 
and clusters w hose profiles are similar to an active query are 
adjudged to have especially high interest Clusters that are 
similar to bom the query and the user's usual interests arc 
most interesting of all. The user may modify or deactivate an 
active query at any tune while browsing. In addition, if the 
user discovers a target object or cluster X of particular 
interest while browsing, he or she may replace or augment 
the original (perhaps vague) query profile with the target 
profile of target object or cluster X, t hereby amplifying or 
refining the original query to indicate an particular interest 
in objects similar to X. For ample, suppose the user is 
browsing through documents, and specifies an initial query 
containing the word "Lloyd's," so that the system predicts 
documents containing the word "Lloyd's" to be more inter- 
esting and makes them more easily accessible, even to the 
point of listing such documents or clusters of such 
documents, as described above. In particular, certain articles 
about insurance containing the phrase "Lloyd's of London" 
are made more easily accessible, as are certain pieces of 
Welsh fiction containing phrases like lioyd's father.** The 
user browses while this query is active and hits upon a 
useful article describing the relation of Lloyd's of London to 
other British insurance houses; by replacing or augmenting 
the query with the full text of this article, the user can turn 
the attention of the system to other documents mat resemble 
mis article, such as documents about British insurance 
houses, rather than Welsh folk tales. 

In a system where queries are used, it is useful to include 
in the target profiles an associative attribute that records the 
associations between a target object and whatever terms are 
employed in queries used to find that target object The 
association score of target object X with a particular query 
term T is defined to be the mean relevance feedback on 
target object X, averaged over just those accesses of target 
object X mat were made while a query containing term T 
was active, multiplied by the negated logarithm of term Ts 
global frequency in all queries. The effect of mis associative 
attribute is to increase the measured similarity of two 
documents if they are good responses to queries mat contain 
the same terms. A further maneuver can be used to improve 
the accuracy of responses to a query: in the summation used 
to determine the quality q/U, X) of a target object X, a term 
is included that is p rop orti onal to the sum of association 
scores between target object X and each term in the active 
query, if any, so that target objects that are closely assoriatrd 
with terms in an active query are determined to have higher 
quality and therefore higher interest for the user. To comple- 
ment the system's automatic reorganization of the hierar- 
chical cluster tree, the user can be given the ability to 
reorganize the tree manually, as he or she sees fit Any 
changes are optionally saved on the user's local storage 
device so that they will affect the presentation of the tree in 
future sessions. For example, the user can choose to move or 
copy menu options to other menus, so that useful clusters 
can thereafter be chosen directly from the root menu of the 
tree or from other easily accessed or topically appropriate 
menus. In an other example, the user can select clusters C,, 
Ca, . . ■ Cjt listed on a particular menu M and choose to 
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remove these clusters from the menu, replacing them on the 
menu with a single aggregate cluster M* containing all the 
target objects from clusters Q. C> . . . C*. Id this case, the 
'TP^wKftt* subclusters of new cluster M* are either taken to 
5 be clusters C lf . . . C k themselves, or else, in a variation 
similar to me M scatter-gather"* method, are automatically 
computed by clustering the set of all the subdusters of 
clusters C lv C 2 , . . - C k according to the similarity of the 
duster profiles of these subclusters. 

10 Electronic Mall 

In one application, the browsing techniques described 
above may be applied to a domain where the target objects 
are purchasable goods. When shoppers look for goods to 
purchase over the Internet ox other electronic media, it is 

15 typically necessary to display thousands or tens of thousands 
of products in a fashion that helps consumers find the items 
they are looking for. The current practice is to use hand- 
crafted menus and sub-menus in which similar items are 
grouped together. It is possible to use the automated chis- 

20 tering and browsing methods described above to mare 
effectively group and present the items. Purchasable items 
can be hicrarchically clustered using a plurality of different 
criteria. Useful attributes for a purchasable item include but 
are not limited to a textual ascription and predefined 

25 category labels Of available), the unit price of the item, and 
an associative attribute listing the users who have bought 
this item in the past Also useful is an associative attribute 
indicating which other items are often bought on the same 
shopping "trip** as this item; items that are often bought on 

30 the same trip will be judged similar with respect to this 
attribute, so tend to be grouped together. Retailers may be 
interested in utilizing a similar technique for purposes of 
predicting both the nature and relative quantity of items 
which are likely to be popular to their particular clientele. 

35 This predktioD may be made by using aggregate purchasing 
records as the search profile set from which a collection of 
target objects is recommended. Estimated customer demand 
which is indicative of (relative) inventory quantity for each 
target object item is determined by measuring the cluster 

40 variance of that item compared to another target object item 
(which is in stock). 

As described above, hierarchically clustering the purchas- 
able target objects results in a hierarchical menu system, in 
which the target objects or clusters of target objects that 

45 appear on each menu can be labeled by names or icons and 
displayed in a two-dimensional or three-dimensional menu 
in which similar items are displayed physically near each 
other ox on the same graphically represented "shelf." As 
described above, mis grouping occurs both at the level of 

50 specific items (such as standard size Ivory soap or large 
Breck shampoo) and at the level of classes of items (such as 
soaps and shampoos). When the user selects a class of items 
(for instance, by clicking on it), then the more specific level 
of detail Is displayed. It is neither necessary nor desirable to 

55 limit each item to appearing In one group; customers are 
more likely to find an object if it is in multiple categories. 
Non-purchasable objects such as artwork, advertisements, 
and free samples may also be added to a display of pur- 
chasable objects, if they are associated with (liked by) 

60 substantially the same users as are the purchasable objects in 
the display. 

Network Context of the Browsing System 

The files associated with target objects are typically 
distributed across a large number of different servers Sl-So 
&5 and clients Cl-Cn. Each file has been entered into the data 
storage medium at some server or client in any one of a 
number of ways, including, but not limited to: scanning. 
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keyboard input, e-mail FTP transmission, automatic syn- profiles P(X)l 8. For each (nonempty) resulting group C of 

thesis from another file under the control of another com- pointers: 9. ff C contains only one pointer, add this pointer 

puter program. While a system to enable users to efficiently to list M; 10. Otherwise, if C contains exactly the same 

locate target objects may store its hierarchical cluster tree on subcluster pointers as does one of the files Fi from among Fl 

a single centralized machine, greater efficiency can be 3 • • - Fh, men add a pointer to file Fi to list M; il. otherwise: 

achieved if the storage of the hierarchical cluster tree is 12. Select an arbitrary server S2 on the network, for example 

distributed across many machines in the network. Each bv ramwmry selecting one of the pointers in group C and 

cluster C, including single-member dusters (target objects), choosing the ^server it points to. 13. Send a ropiest message 

is o^ylep^^by a file R which is muirir^t to a ^ZSXS^ !^Sla K£ , 

tc^muMcast tree MT(C1); here cluster CI is either 10 ^vf*^^^,^^ 

dusterCitselforsc^superclusto Tr^lL^^^^^Z 

it , . i #_ tl. «i_ a pouter to a file O mat represents the merged tree. Add mis 

file F is stored at multiple servers, for redundancy Tbt file JT^ ^ »T 

Fthat represents cluster C contaons at least the following £™t M doe, no, indudH 3 FL .end a 

1. The duster profile for duster C or data sufficient to is ~f « ^f^J 1 ^^ ^ 

reconstruct thTcluster prcfile. 2. The number of target to ddcte fik R. 17. Crcatcand store a file F that represents 

objects contained incmsWc 3. ArmnaMeadabtelabd ° ^J^"^ ^T* "? * ^ 

far duster C as described in section "L^elLng dusters" subduste pointers on list M. 18. Send a reply message to 

a^^du^ is divided^ ^Z^^^^^T^X^ 

pointc^o files rating the subdustc^ painter 20 ^^T^^ Z^^T^Z^ 

is an ordered pair containing namina, first, a file, and ~L , , ~, aw * c 1 * . 

. . *™ WUUIUUU » ™~ tree MT fill that includes all proxy servers in the network, 

second, a multicast tree or a specific server where that file " \. , t . . . , *7 7 " r" 

is^S.ir^clusterctJrofasi.gle^^ect. < £ S £%%J^* 22ttJ?'u£JS2Z 

apolntertod.fllecc.r^d^to retarget object ^^25 S Z£T*£J? I'SStZZl 

The nrocess bv which a client tMrfiin* can retrieve the file 23 v^. " . : . " . . . , 

F fi^ta^i^^ above in one w«* good «a«y) is elected from the 

section Tletri^ ^ * Se^er S s«^ itsetf a^ r^uest message that 

retrieved file Em^m can p^ causes each pr^s^ m ^ ^ ts.^ach proxy 

* 1^ JTTi • , . , , J~" - server in the network) to ask its clients for files for the cluster 

ins to this cluster, such as displayins a labeled menu ox ^ ^ / . , 

we w uu* vuuma, « uupw/ui^ « ««. tree. 3. The clients of each oroxv server transmit to me proxy 

subclusters, from which the user may select subciusters for 30 ^ TT a * ^IT L • TvJu * i . 

^ A k . server any files that they maintain which files represent 

the client to retrieve next . / . . , . , , 

. ^^-w-^ ;o target objects from the appropriate domain that should be 

and data retrieval can be earned out concurrently. Second. 33 ^^7^*~7Zl V7*J1 ^^tltrZ 1 7™L i« 

A _ . . . . . . . . . ~i stored entirely on SL, but may m principle be stored in a 

to redundancy inherent u our design — data is replicated at ,,.„«..» , , *. . !T N ' ^ *?. r ^ , . . . 

♦~ *L* « tA ^ . ^ w distributed fashion, (b) Wait until all servers to which the 

tree sites so that even if a server is down, me data can be 40 C1 . r ^ im-MPf D . m M# ^ m • # 

located elsewhere. server SI has propagated request K nave sent me iccijaent 

The ilisuibutedhierarcliical duster tree can be created in ^ ^^tM^fJ 0 - d ^f, Vt, ^ 

adistaTwtedfadiioii.matis.wimmep together the duster tree acated in step 5(a) and the 

proc^TlSta^ appHaakT^b^be recJ duster tre« suppKed in step 5(bX by sending any server 

Sed^tin^otime. becau^as users interact with target 45 ^t^Jl^.* ™^JZ?** ln lJ UCh 

^c^^e^ocS e^L in rnTta^etpofile, ofL abc^ (d)Upc« recdviiig a reply to the message 

wjcto BKsuonuvc <uuiuui»ui me uu^^vauw vi uj* 5^ £q / c \ wtuch reary includes a Dointer to a file reDre- 

target objects change to reflect these interactions; the sys- \£ ^~J^JZ^ZT fiLS^AU lX r?t£ 

terns similarity nieasurements can therefore take these inter- ^ g ^J^ft ^ for ^f d f ^ m " 

actions into account when judging dimlarity, which allows ^^J^ 1 ^ 1 ' ^ *»* 15 ™ ^ 5 * ^ S 

an^perspicucHttdusterut^ so *!" ^^f^ ^ 

is the foUowing procedure for merging n disjoint cluster 111 ^ to on «ibedded request RL 6. Server S 

wu^ iag i»w~wic / . Tr^T receives a reply to me message it sent in 5(c). This reply 

trees, represented respectively by files Fl . . . Fn in distrib- . . . ' _ fl "i^ r*' .Tjl 

j 4_rii_: • ^ . , includes a pointer to a file F that represents the conmleted 

uted fashion as described above, into a comhined cluster tree ^ t e « 

.11 ^ >^ Jutt ^ .il ^ hierarchical cluster tree. Server S m nlh c asts file F to all 

toat contains aU ii^ target objects fi^ aU tiicse frees The the hierarchical cluster tree 

files Fl ... Fn arc described above, except that the cluster 55 Z^Z"^^ .t 1 ^^^!^^^^^^^^ 

™ htk* Mum.'.,, has been createa as above, server 5 can send additional 

^« ^J^^It ^3 messages through the cluster tree, to arrange that miilticast 

steps are executed by a server St in response to a request to ^ (Q ^^^^ for 5^^^ ^ dustcrs C ^ 

messaite from another server Sf, which request message " . yr ; „T ^^^^T™£r 1 

. *7 " ' tT 4 • ^v^ST that each file F is multicast to the tree MT(0, where C is the 

includes pointers to the files Fl . . . Fn. 1. Retrieve files Fl , . „ v w ' 

r~ V . 7* ™ ,. vv ZT7 smallest duster contaming file F. 

...Fn. 2. Let LaadM be empty lists, 3. For each file Hrrom 60 ^^^ ^ 

among Fl . . . Fn: 4. If file Fi contains pointers to subcluster MATCHING USERS FOR VIRTUAL 

files, add these pointers to list L. 5. If file R represents a COMMUNITIES 

single target object, add a pointer to file R to Est L. 6. For Virtual Communities 

each pointer X on list L, retrieve the file that pointer P points Computer users frequently join other users for discussions 

to and extract the cluster profile P(X) that this file stores. 7. 65 on computer bulletin boards, newsgroups, mailing lists, and 

Apply a ^tiigtmng algorithm to group the pointers X on list real-time chat sessions over the computer network, which 

L according to the distances between their respective duster may be typed (as with Internet Relay Chat (IRQ), spoken 
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(as with Interact phone), or videoconfexenced. These focums groups of pseudonyms whose associated users have corn- 
are herein texmed **virtual communities." In current practice, man interests. 3. Match pseudonymous users with virtual 
each virtual community has a specified topic, and users communities, creating new virtual communities when nec- 
discover communities of interest by ward of mouth or by essary. 4. Continue to enroll additional pseudonymous users 
examining a long list of communities (typically hundreds or 5 in the existing virtual communities, 
thousands). The users then must decide for themselves Each of these steps can be carried out as described below, 
which of thousands of messages they find interesting from Scanning 

among those posted to the selected virtual communities, mat Using the technology described above. Virtual Commu- 

is, made publicly available to members of those communi- nity Service constantly scans all the messages posted to all 

ties. If they desire, they may also write additional messages to the newsgroups and electronic mailing lists on a given 

and post them to the virtual communities of their choice. The network, and constructs a target profile far each message 

existence of thousands of Internet bulletin boards (also found. The network can be the Internet, or a set of bulletin 

termed newsgroups) and countless more Internet mailing boards maintained by America Online, Prodigy, or 

lists and private bulletin board services BBS's) demonstrates CompuServe, or a smaller set of bulletin boards that might 

the very strong interest among weiiibeis of the electronic is be local to a single organization* for example a large 

canmiuniry in forums for the discussion of ideas about company* a law firm, or a university. The scanning activity 

almrt^ *qy «ihj<»r» mtagin*hii». Prwjitiy, virtual mmm unity need not be confined to bulletin boards and mailing lists that 

creation proceeds in a haphazard form, usually instigated by were created by Virtual C^miim 

a single individual who decides that a topic is worthy of used to scan the activity of communities that predate Vteal 

(fiscussion. There ire protocols on the Internet for voting to 20 C^mmnnity Service or are otherwise created by means 

determine whether a newsgroup should be created, but there outside the Virtual Community Service system, provided 

is a large hierarchy of newsgroups (which begin with the that these communities are public ox otherwise grant their 

prefix "ait") that do not follow this protocoL permission. 

The system for customized electronic identification of The target profile of each message includes textual 

desirable objects described herein can of course function as 25 attributes specifying the title and body text of the message, 

a browser for bulletin boards, where target objects are taken In the case of a spoken rather than written message, the latter 

to be bulletin boards, or subtopics of bulletin boards, and attribute may be computed from the acoustic speech data by 

each target profile is the cluster profile for a cluster of using a speech recognition system. The target profile also 

documents posted on some bulletin board. Thus, a user can includes an associative attribute listing the authors) and 

locate bulletin boards of interest by all the navigational 30 designated recipients) of the message, where the recipients 

techniques described above, including browsing and query- may be individuals and/or entire virtual communities; if this 

ing. However, this method only serves to locate existing attribute is highly weighted, then the system tends to regard 

virtual ''" Trm" 1 ^**** Because people have varied and vary- messages among the same set of people as being similar or 

ing rr*rrp\n interests, it is desirable to automatically locate related, even if the topical similarity of the messages is not 

groups of people with common interests in order to form 35 clear from their content, as may lumpen when some of the 

virtual commnnities.The Virtual Community Service (VCS) messages are very short Other important attributes include 

described below is a network-based agent that seeks out the fraction of the message that consists of quoted material 

users of a network with common interests, dynamically from previous messages, as well as attributes that are 

creates bulletin boards or electronic maiHng lists for those generally useful in characterizing oocuments. such as the 

users, and introduces them to each other electronically via 40 message's date, length, and reading leveL 

e-maiL It is useful to note that once virtual communities Virtual Community Identification 

have been created by VCS, the other browsing and filtering Next, Virtual Community Service attempts to identify 
technologies described above can subsequently be used to groups of pseudonymous users with common interests, 
help a user locate particular virtual communities (whether These groups, herein termed "prc^xmiiniinities.'' are rcpre- 
pre-existing or automatically generated by VCS); similarly, 45 seated as sets of pseudonyms. Whenever Vm^ Community 
since the messages sent to a given virtual community may Service identifies a pre^community, it will subsequently 
vary in interest and urgency for a user who has joined that attempt to put the users in said pre-commumty in contact 
cc^nmunity, these browsing and filtering technologies (such with each other, as described below. Each pre^mmunity is 
as the e-mail filter) can also be used to alert the user to urgent said to be ^determined" by a duster of messages, pseud- 
messages and to screen out uidnteresting ones. 50 onymous users, search profiles, or target objects. 

The functions of the Virtual Community Service are In the usual method for determining pre-communities, 

general functions that could be implemented on any network Virtual Cornrnnrrity Service clusters the messages that were 

ranging from an office net w ork in a small company to the scanned and profiled in the above step, based on the simi- 

Worid Wide Web or the Internet The four main steps in the larity of those messages* computed target profiles, thus 

procedure are: 1. Scan postings to existing virtual commu- 55 automatically finding threads of discussion that show com- 

nities. 2. Identify groups of users with common interests, 3. moo interests among the users. Naturally, discussions in a 

Match users with virtual communities, creating new virtual single virtual conminnity tend to show common interests; 

communities when necessary. 4. Continue to enroll addi- however, this method uses all the texts from every available 

tional users in the existing virtual communities. virtual community, Including bulletin boards and electronic 

More generally, users may post messages to virtual com- 60 mailing lists. Indeed, a user who wishes to initiate or join a 

munities pseudonymously, even employing different pseud- discussion on same topic may send a "feeler message" on 

onyms for different virtual communities. (Posts not employ- mat topic to a special mailing list designated for feder 

ing a pseudonymous mix path may, as usual, be considered messages; as a consequence of me scanning procedure 

to be posts employing a non-secure pseudonym, namdy the described above, the feder message is automatical been sent 

user's true network address.) Therefore, (he above steps may 65 to th any similarly profiled messages thai have been sent to 

be expressed more generally as follows 1. Scan pseudony- this special mailing list, to topical mailing lists, or to topical 

mous postings to existing virtual communities. 2. Identify bulletin boards. The dustering step employs "soft 
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clustering.* in which a message may belong to multiple initiated in the same way from a duster whose cluster profile 

clusters and hence to multiple virtual communities. Each in that other system is within a threshold distance of the 

cluster of messages that is found by Virtual Community cluster profile of duster C The threshold distance used in 

Service and that is of sumoent size (for example, 10-20 each case is optionally dependent on the cluster variance or 

different messages) determines a pre-communfty whose s cluster diameter of the profile sets whose means are being 
members are the pseudonymous authors and recipients of compared. 

the messages in the cluster. More precisely, the pre- If no existing virtual community V meets these conditions 

community consists of the various pseudonyms under which and is also willing to accept all the users in pre-community 

the messages in the cluster were sent and received. M as new members, then Virtual Conmiunity Service 

Alternative methods for determining a pre^xxminiinity. to attempts to create a new virtual community V. Regardless of 

which do not require the scanning step above, include the whether virtual community V is an existing community or a 

following: 1. Prc-communities can be generated by grouping newly created comnninity. Virtual Community Service 

together users who have similar interests of any sort, not sends an e-mail message to each pseudonym P in pre- 

merery Individuals who have already written or received community M whose associated user U does not already 

messages about similar topics. If the user profile associated is belong to virtual community V (under pseudonym P) and 

with each pseudonym Indicates the user's interests, for has not previously turned down a request to join virtual 

example through an associative attribute that indicates me community V. The e-mail message informs user U of the 

documents or Web sites a user likes, then pseudonyms can existence of virtual community V, and provides instructions 

be clustered based on the similarity of their associated user which user U may follow in order to join virtual comniunity 

profiles, and each of the resulting dusters of pseudonyms 20 V if desired; these instructions vary rtrprmling on whether 

determines a pre-community comprising the pseudonyms in virtual community V is an existing community or a new 

the cluster. 2. If each pseudonym has an associated search community. The message includes a credential, granted to 

profile set formed through participation in the news dipping pseudonym P. which credential must be presented by user U 

service described above, then all search profiles of all upon joining the virtual community V. as proof that user U 

pseudonymous users can be dustered based on their 23 was actually invited to join. If user U wishes to join virtual 

similarity, and each cluster of search profiles determines a community V under a different pseudonym Q. user U may 

pre-community whose members are the pseudonyms from first transfer the credential from pseudonym P to pseudonym 

whose search profile sets the search profiles in the duster are Q, as described above. The e-mail message further provides 

drawn. Such groups of people have been reading about the an indication of the common interests of the community, for 

same topic (or. more generally, accessing similar target 30 example by including a list of titles of messages recently 

objects) and so presumably share an interest 3. If users sent to the comniunity. or a charter or introductory message 

participate in a news clipping service or any other filtering provided by the community (if available), or a label gener- 

or browsing system for target objects, then an individual ated by the methods described above that identifies the 

user can pseudonymously request the formation of a virtual content of the duster of messages, user profiles, search 

community to discuss a particular duster of one or more 35 profiles, or target objects that was used to identify the 

target objects known to that system. This cluster of target pre-community M. 

objects determines a pre-community consisting of the pseud- If Virtual Ccanmunity Service must create a new caramu- 
onyms of users determined to be most Interested in that nity V. several methods are available for enabling the 
cluster (for example, users who have search profiles similar members of the new community to communicate with each 
to the duster profile), together with the pseudonym of the 40 other. If the pre-conuininity M is large, for example con- 
user who requested formatioQ of the virtual community. tainimg more than 50 users, then Virtual Community Service 
Matching Users with Communities typically establishes either a multicast tree, as described 
Once Virtual Community Service identifies a cluster C of below, or a widely-distributed bulletin board, assigning a 
messages, users, search profiles, or target objects that deter- name to new bulletin board If the pre-community M has 
mines a pre-community M; it attempts to arrange for the 45 fewer members, for example 2-50, Virtual Community 
members of this pro-coimnuniry to have the chance to Service typically establishes either a multicast tree, as 
participate in a common virtual community V. In many described below, or an e-mail mailing list ff the new virtual 
cases, an existing virtual community V may suit the needs of community V was detennined by a cluster of messages, then 
the pre^mmunity M. Virtual Community Service first Virtual Community Service kicks off the discussion by 
attempts to find such an existing community V. In the case so distributing these messages to all members of virtual corn- 
where duster C is a cluster of messages, V may be chosen munity V. In addition to bulletin boards and mailing lists, 
to be any existing virtual cotununity such mat the cluster alternative for a that can be created and in which virtual 
profile of cluster C is within a threshold distance of the mean coriimunities can gather indude real-time typed or spoken 
profile of the set of messages recently posted to virtual conversations (or engagement or distributed multi-user 
community V; in the case where duster C is a cluster of 55 applications inrhiHing video games) over the computer 
users. V may be chosen to be any existing virtual conuniinity network and physical meetings, any of which can be sched- 
such that the duster profile of cluster C is within a threshold uled by a partly automated process wherein Virtual Corn- 
distance of the mean user profile of the active members of rnunity Service requests meeting time preferences from all 
virtual comniunity V; in the case where the duster C is a members of the preH»mrnuaity M and then notifies these 
duster of search profiles, V may be chosen to be any existing 60 individuals of an appropriate meeting time, 
virtual oommunity such that the cluster profile of cluster C Continued Enrollment 

is within a threshold distance of the cluster profile of the Even after creation of a new virtual community, Virtual 

largest cluster resulting from dustexing all the search pro- Community Service continues to scan other virtual connnu- 

files of active members of virtual community V; and in the nities for new messages whose target profiles are similar to 

case where the cluster C is a cluster of one or more target 63 the community's cluster profile (average message profile), 

objects chosen from a separate browsing or filtering system, Copies of any such messages are sent to the new virtual 

V may be chosen to be any existing virtual community community, and the pseudonymous authors of these 
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messages* as well as users who show high interest in reading sends a message to itself and to all servers that arc vertices 
such messages, are informed by Virtual Community Service of graph G, instructing these servers to modify their locally 
(as for pie-community members, above) that they may want stored subtrees of MT(V) by adding S as a vertex and adding 
to join the community. Each such user can men decide an edge between SI and S. 3. When a user at a cheat q 
whether or not to join the community. In the case of Internet s wishes to send a message F to virtual community V, client 

Relay Chat (IRQ, if the target profile of messages in a real q embeds message Fin a request R instructing the recipient 

time dialog are (or become) similar to that of a user, VCS to store message F locally, for a limited time, for access by 

may also send an urgent e-mail message to such user rnember s of virtual community V. Request R includes a 

whereby the user may be automatically notified as soon as credential proving that the user is a member of virtual 

the dialog appears, if desired. 10 community V or is otherwise entitled to post messages to 

With these facilities. Virtual Community Service provides virtual community V (for example is not "black marked" by 

automatic creation of new virtual communities in any local that or other virtual c o mm unity membe rs ). Client q then 

or wide- area network, as well as maintenance of all virtual broadcasts request R to all core servers in the multicast tree 

communities on the network, including those not created by MT(V), by means of a global request message transmitted to 

Virtual Community Service. Hie core technology underiy- is the user's proxy server as described above. The core servers 

ing Vitual Community Service is creating a search and satisfy request R, provided that they can verify the included 

clustering mechanism that can find articles mat arc "similar* credential. 4. In order to retrieve a particular message sent to 

in mat the users share interests. This is precisely what was virtual community V, a user U at client q initiates the steps 

described above. One must be sure that Virtual Community described in section Retrieving Files from a Multicast 

Service does not bombard users with notices about comma- 20 Tree.'* above. If user U does not want to retrieve a particular 

nities in which they have no real interest On a very small message, but rather wants to retrieve all new messages sent 

network a human could be "in the loop", scanning proposed to virtual community V, then user IT pseudonymousry 

virtual communities and perhaps even giving them names. instructs its proxy server (which is a core server for V) to 

But on larger networks Virtual Community Service has to send it all messages that were multicast to MT(V) after a 

run in fully automatic mode, since it is Hkery to find a large 23 certain date. In either case, user U must provide a credential 

number of virtual c ommunitie s proving user U to be a member of virtual community V, or 

Delivering Messages to a Virtual Cornmunity otherwise entitled to access messages on virtual cormnunity 

Once a virtual cornmunity has been identified, it is V. 

straightforward for Virtual Community Service to establish qttmmary 

a mailing list so mat any member of the virtual conuminity 30 jummari 

may distribute e-mail to all other members. Another method A method has been presented for automatically selecting 

of distribution is to use a conventional network bulletin articles of interest to a user. The method generates sets of 

board or newsgroup to distribute the messages to all servers search profiles for the users based on such attributes as the 

in the network, where they can be accessed by any member relative frequency of occurrence of words in the articles read 

of the virtual mmmnnit y. However, these simple methods do 35 by the users, and uses these search profiles to efficiently 

not ta fty into account cost and performance advantages identify future articles of interest. The methods is charac- 

which accrue from optimizing the construction of a mumV terized by passive mon it oring (users do not need to explic- 

cast tree to carry messages to the virtual cornmunity. Unlike idy rate the articles), multiple search profiles per user 

a newsgroup, a multicast tree distributes messages to only a (reflecting interest in multiple topics) and use of elements of 

selected set of servers, and unlike an e-mail mailing list, it 40 the search profiles which are automatically determined from 

does so efficiently. the data (notably, the TF/IDF measure based on word 

A separate multicast tree MT(V) is maintain^ for each frequencies and descriptions of purchasable items). A 

virtual c ommunity V, by use of the following four proce- method has also been presented for automatically generating 

dures. 1. lb construct or reconstruct this multicast tree, the menus to allow users to locate and retrieve articles on topics 

core servers for virtual commiinity V are taken to be those 43 of interest This method clusters articles based on their 

proxy servers that serve at least one pseudonymous rnember similarity, as measured by the relative frequency of word 

of virtual corrimunity V. Then the multicast tree MT(V) is occurrences. Clusters are labeled either with article titles or 

established via steps 4-6 in the section "Multicast Tree with key words extracted from the article. The method can 

Construction Procedure** above. 2. When a new user joins be applied to large sets of articles distributed over many 

virtual community V, which is an existing virtual so niacfnnes. 

cornmunity, the user sends a message to the user's proxy It has been further shown how to extend the above 

server S. If user's proxy server S is not already a core server methods from articles to 

for V then it is designated as a core server and is added to profiles can be generated, including news articles, reference 
the multicast tree MT(V), as follows. If more man k servers or work articles, electronic mail, product or service 
have been added since the last time the multicast tree MT(V) 55 descriptions, people (based on the articles they read, demo- 
was rebuilt, where k is a function of the number of core graphic data, or the products they buy), and electronic 
servers already in the tree, men the entire tree is simply bulletin boards (based on the articles posted to them). A 
rebuilt via steps 4-6 in the section "Multicast Tree Con- particular consequence of being able to group people by 
struction Procedure** above. Otherwise, server S retrieves its their interests is that one can form virtual commu nities of 
locally stored fist of nearby core servers for V, and chooses 60 people of corntnon interest who can men correspond with 
a server SI. Server S sends a control message to SI, one another via electronic mail 
indicating that it would like to be added to the multicast tree We claim: 

MT(V). Upon receipt of this message, server SI retrieves its 1. A method for automatically providing a user with 

locally stored subtree Gl of MT(V), and forms a new graph confidential access to selected ones of a plurality of target 

G from Gl by removing all degree- 1 vertices other than SI 65 objects and sets of target object characteristics that are 

itself. Server SI transmits graph G to server S, which stores accessible via an electronic storage media, where said users 

it as its locally stored subtree of MT(V). Finally, server S are connected via user terminals and data communication 
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connections to a target server system which accesses said tion and viewing duration information as dictated by said 

electronic storage said method comprising the steps user's access rules. 

of: 10. The method of claim 4 wherein said step of mapping 

confidentially generating a user pseudonym at a proxy rurmer comprises: 

server, which pseudonym is unique to said user, by s enabling said user to access said memory to input said 

means of authenticated user credentials provided by an st 2f c< * ( * ala * 

n,TtH^twtin g entity; 11 ™* method <* cb*m 3 wherein said proxy server 

; rC g U ] ates access to said user by parties other than said user, 

n^ing a uscxt^rtpromcutotst summary ladioibvc of map™* ccca^se*: 

of said user s access patterns to target objects and sets . . . .. " % . .. , . . 

of target object characteristics to said user pseudonym; 10 ° f dtrndemta of parties who 

said user desires to have access to said user; 

enaMng access by s a^er to said (fciratty of target C0IIC i atillg ^ mdicativc rf characteristics of a one of 

c^andsets of target objert chanuterMcs stoed ^dp^ic witb said stored data to determine whether 

c* sail electronic storage mediajia said user target £ authorized to access said user, and 

profile interest summary associated with said user s ... . ^ . . .. 4 

Dseudonvm: and enabling said one party to access said user when said step 

. of correlating determines authorization of access, 

confidentially routing target objects and sets of target 1Z The mctTKxl of claim H wheixin said step of enabling 

object characteristics, retrieved in said step of enabling access comprises: 

2. SnTn^oo^ cSn 1 further ccinjrising the step of: ^ tf^J^ ™ ^ fi/U 1H , 

. , ° , *■ placing said received message in a packet conndcntialry 

assuring at said i i i the n r tcttting entity that said generated addressed to said user; and 

pseudonym has unalterable stipulations for qualifying fransnlitting 5lid tosaidusa . terminal, 

said user wtuch untainj^ s^^iis r^^from nihodrfcSim 11 where said prey server 

the cheeky of said user credential, by sarf authenn- aggregtato ^ ^ ^ ^ prSfilc interest 

caangenury. summary and on/or specific transaction information in a set 

S.TheinethodcidaimJfuithercanp^ oHtotistics which inaybe provided to Urget^cject provid- 

assuring that said user target profile interest summary m ^ omer parties desiring of said information, possibly In 

contains said user credentials in an untamperaWe pox- exchange fee cash-money or other considerations to be 

tion of said user target profile interest summary asso- provided to the agency that operates the pseudonymous 

dated with said pseudonym. 30 server . 

4. The method of claim 1 wherein said target server 14 The method of claim 1 wherein said step of confi- 
system is connected to said user terminal via said proxy dentialiy generating comprises: 

server, said step of mapping comprises: accessing a validating server to enable said validating 

rnediating in said proxy server between said target server server authenticate identity of said user, 

system and said user terminal. 5 15. The method of claim 2 wherein said step of mapping 

5. The method of daim 4 wherein said proxy server comprises: 

mainta i n s and updates said user target profile interest sum- transmitting user target profile interest summary data, 

mar y- . indicative of user target object access activity, from 

6. The method of claim 4 wherein said proxy server may terminal to said proxy server, and 
mediate between me user and other parties such as, but not updating said user target profile interest summary with 
limited to publicly known target servers, pseudooymous said received user target profile set data, 
target-object server entities, publicly known individuals 16. The method of claim 1 wherein said step of confi- 
users, and other P^uymous inrfvidual users, dentialiy generating is responsive to a request received from 

7. The method of claim 6 wherein said proxy server said user for generating a replacement user pseudonym 
regulates access to said u ser target profile interest summaries ^ uni to said user who already has an issued 
by parties other than said user, said step of mapping com- pseudonym by means of an aumetiiicated credential pro- 
i jn5CS: vided by an authenticating entity to thereby ensure user 

storing data in a memory indicative of characteristics of anonymity. 

parties who said user desires to have access to said user ^ 17. The method of rimim 2 wherein said authenticating 

target profile interest smmnary; entity includes authenticated attributes for said user, 

cccrelating data mdicative of characteristics of a one of 18. The method of claim 1 wherein said target object 

said parties with said stored data to determine whether comprises purchasable goods, said method further compris- 

said one party is authorized to access said user target ing the steps of: 

profile interest summary; and 35 transmitting data from said user twrninai to said target 

enabling said one party to access said user target profile server system indicative of said user's authorization to 

interest summary when said step of correlating deter- purchase identified ones of said purchasable goods; 

mines aumorization of access. accessing a credit server via said proxy server to process 

8. The method of daim 6 wherein said stored data a financial transaction to debit said user for purchase of 
ranmxises a party profile which defines which of said parties 60 said identified purchasable goods. 

can access to said user target profile interest simimary and 19. Apparatus for automatically providing a user with 

access condition data which instructs said proxy server of confidential access to selected ones of a plurality of target 

the manner of said regulated access. objects and sets of target object characteristics that are 

9. The method of daim 8 where said regulated access to accessible via an electronic storage media, where said users 
said user target profile interest summary may be specific to 65 are connected via user terminals and data communication 
allowing the release of certain classes of transactions and connections to a target server system which accesses said 
other second order attributes synthesized from raw transac- electronic storage media, said method comprising: 
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means for confidentially gen crating a user pseudonym at 28. The apparatus of claim 22 wherein said means for 

a proxy server, which pseudonym is unique to said user. mapping further comprises: 

by means of authenticated user credentials provided by for enabling said user to access said memory to 

an authenticating entity; input said stored data. 

means for mapping a user target profile interest summary * 29. The method of daim 21 wherein said proxy server 

indicative of said user's access patterns to target objects regulates access to said user by parties other than said user, 

and sets of target object characteristics to said user s ^ means for mapping comprises: 

pseudonym, ...... means for storing data indicative of characteristics of 

means for enabling access by said user to saul plurality of parties who said user desires to have access to said user, 
target objects nnf * sets of target object characteristics 

stored on said electronic storage media via said user mcans for chelating indicative of characteristics of 

target profile interest summary associated with said a one of said parties with said stored data to determine 

user's pseudonym; and whether said one party is authorized to access said user, 

means for confidentially routing target objects and sets of . **** 

target object characteristics, retrieved in said step of means for enabling said one party to access said user 

enabling access, to said user. when said step of correlating determines authorization 

20. The apparatus of claim 19 further comprising: of access. 

mrj*n<i for coring at said authenticating entity that said 3** Th c method of claim 29 wherein said means for 

generated pseudonym has untamperabie stipulations 20 cnaWin 8 access comprises: 

for qualifying said user, which untamperabie stipula- means for receiving a message from said one party; 

tions result from the checking of said user credentials means for placing said received message in a packet 

by said authenticating confidentially addressed to said user; and 

21 The apparatus of daim 2* further comprising: , . , 

means for t ransmi tting said nacm to said user te rminal 

means for assuring that said user target profile interest 25 L ,^ ms daim Kwboc ^ p^y saver 

summary contains said user credentials in an lutarnper- t ^ZT . . t ^ _ 

able portion of said user target profile interest summary <W?*ff**f ^ ^ ^ f"** 

associated with said pseudonym. summary on/or specific transaction information in a set 

22. The apparatus of claim 19 wherein said taget server of statistics which may be provided to target-object provid- 
system is connected to said user terminal via said proxy 30 m °* er desiring of said information, possibly in 
server, said means fa mapping comprises: exchange for cash-money or other considerations to be 

means for mediating in said proxy server between said P rovided to me ** °P crates «* pseudonymous 

target server system and said user terminal. server. 

„ _r ' _ , f ^ . , . . 32. The apparatus of claim 19 wherein said means for 

23. The apparatus of claim 22 wherein said proxy server 7. . ."^ . ~ ~ * 
maintains and updates said user target profile interest sum- 35 confidcntiall y generating comprises. 

mary means for accessing a validating server to enable said 

24. The apparatus of claim 22 wherein said proxy server validating server authenticate identity of said user, 
may mediate between the user and other parties such as, but 33* The apparatus of daim 20 wherein said means for 
not limited to publicly known target servers, pseudonymous mapping comprises: 

target-object server entities, publicly known individuals 40 means for transmitting user target profile interest sum- 
users, and other pseudonymous individual users, mary data, indicative of user target object access 

25. The apparatus of claim 24 wherein said proxy server activity, from said user terminal to said proxy server, 
regulates access to said user target profile interest siimmaries and 

by parties other than said user, said means for mapping means for updating said user target profile interest sum- 
comprises: 45 mary with said received user target profile set data, 
means for storing data in a memory indicative of charac- 34. The apparatus of claim 19 wherein said means fox 
teristics of parties who said user desires to have access confidentially generating is responsive to a request received 
to said user target profile interest summary; from said user for generating a replacement user pseudonym 
means fcr correlating data indicative of characteristics of „ which is unique to said user who already has an issued 
a one of said parties with said stored data to determine pseudonym by means of an authenticated credential pro- 
whether said one party is authorized to access said user vided by an aut hentic a t i n g entity to thereby ensure user 
target profile interest summary; and anonymity, 
means for enabling said one party to access said user 35. Tie appajams^ daim 2l^emn said authenticating 
target profile interest summary when said step tf « . 
relating determines authorization of access 36. The apparatu s of claim 19 wherein said target object 

26. The apparatus of claim 24 wherein said stored data <W*« purchasable goods, said apparatus further corn- 
comprises a party profile which defines which of said parties prises. 

can access to said user target profile interest summary and means for transmitting data from said user terminal to said 

access condition data which instructs said proxy server of ^ target server system indicative of said user's authori- 

the manner of said regulated access. zation to purchase identified ones of said purchasable 

27. The apparatus of claim 26 where said regulated access goods; 

to said user target profile interest summary may be specific means for accessing a credit server via said proxy server 

to allowing the release of certain classes of transactions and to process a financial transaction to debit said user for 

other second order attributes synthesized from raw transao- w purchase of said identified purchasable goods, 
tion and viewing duration information as dictated by said 

user's access rules. * * * * * 
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